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ABSTRACT 


This  thesis  considers  the  problem  of  scaling  rule-based  inference  to  large  quantities 
of  RDF  data  found  on  the  Semantic  Web.  The  general  approach  is  one  of  data 
parallelism,  that  is,  dividing  data  among  processors  such  that  the  collective  results  of 
each  processor’s  individual  inference  is  the  same  as  though  inference  was  performed 
sequentially.  In  this  way,  theoretically  speaking,  more  processors  can  be  added  to 
accommodate  more  data. 

The  problem  is  first  considered  from  the  perspective  of  the  operational  seman¬ 
tics  of  inference  with  production  rules.  The  question  is  asked,  under  what  conditions 
is  embarrassingly  parallel  inference  guaranteed  to  be  correct?  Sufficient  conditions 
are  determined  and  proven  at  both  a  fine-grained  level  close  to  the  basic  operational 
semantics  and  a  more  coarse-grained  level  that  applies  directly  to  rules.  The  con¬ 
ditions  are  placed  on  the  relationship  between  rules  and  distribution  schemes,  that 
is,  the  way  in  which  data  is  assigned  to  processors. 

Then,  a  special  class  of  distribution  schemes  is  considered  called  replication 
schemes.  Replication  schemes  require  that  individual  data  either  be  replicated  to 
all  processors  or  placed  arbitrarily  on  some  processor (s).  The  aforementioned  con¬ 
ditions  are  then  reformulated  to  consider  replication  schemes  which  reveals  that 
testing  the  conditions  for  replication  schemes  is  reducible  to  satisfiability  (SAT), 
and  not  only  SAT  but  2SAT.  An  augmented  version  of  this  reduction  which  is  a 
reduction  to  3SAT  also  accounts  for  the  possibility  to  eliminate  some  rules  in  order 
to  improve  parallelization.  These  reductions  along  with  a  proposed  methodology  for 
restricting  rules  are  used  to  derive  restricted  versions  of  the  RDFS  and  OWL2RL 
rules  that  are  amenable  to  parallel  inference. 

Finally,  an  evaluation  is  performed  that  tests  these  theoretical  findings  for  re¬ 
stricted  versions  of  RDFS  and  OWL2RL  inference  on  two  large,  well-known  datasets 
exceeding  a  billion  triples:  LUBM10K  and  BTC2012.  The  LUBM10K  dataset  repre¬ 
sents  an  optimistic  case,  meaning  that  if  performance  is  poor  with  LUBM10K,  then 
it  will  likely  be  poor  on  many  datasets.  On  the  other  hand,  the  BTC2012  dataset 


x 


represents  a  pessimistic  case,  meaning  that  if  performance  is  good  with  BTC2012, 
then  it  is  likely  that  performance  will  be  good  with  other  datasets.  While  the  usual 
scalability  metrics  are  used  (speedup,  efficiency,  etc.),  the  Karp-Flatt  metric  reveals 
that  inference  is  almost  entirely  parallel  for  LUBM10K  data,  demonstrating  the 
practical  feasibility  of  the  theoretical  findings.  However,  for  BTC2012,  it  must  be 
ensured  that  there  is  sufficient  memory  and  load-balancing  to  achieve  this  high  level 
of  scalability  on  distributed  memory  architectures.  Regardless,  for  feasible  cases, 
very  low  times  are  achieved  for  LUBM10K  (seconds)  and  BTC2012  (minutes). 


CHAPTER  1 
INTRODUCTION 

Since  its  conception,  the  World  Wide  Web  (Web)  has  presented  a  persistent  prob¬ 
lem:  given  the  vast  amount  of  information  published,  how  does  one  find  anything  of 
a  particular  interest?  Traditionally,  the  Web  has  been  viewed  as  a  set  of  interlinked 
documents  in  which  the  documents  are  -  from  the  perspective  of  algorithms  -  “bags 
of  words.”  Such  a  view  of  the  Web  renders  the  information  meaningless  to  machines, 
compelling  the  need  for  algorithms  that  ignore  the  inherent  meaning  of  the  informa¬ 
tion  and  instead  rely  on  probabilities  or  cleverly  devised  ranking  metrics  based  on 
structural  characteristics  of  the  Web.  Modern  search  engines  have  essentially  solved 
this  particular  problem. 

The  Semantic  Web  [1]  -  the  Web  as  it  is  today  -  contains  not  only  “bags 
of  words”  but  also  structured  data  with  explicit  semantics,  in  the  form  of  Resource 
Description  Framework  (RDF)  [2]  triples.  These  explicit  semantics  allow  algorithms 
to  discover,  in  some  sense,  the  precise  meaning  of  the  data.  Utilizing  this  information 
to  improve  search  and  data  integration  is  still  an  open  problem.  While  the  structured 
data  has  an  explicit  semantics,  these  semantics  often  imply  other  data.  In  other 
words,  additional  data  need  to  be  inferred  to  more  fully  reveal  the  explicit  data’s 
meaning  and  entailments. 

It  is  arguably  the  case  that  the  most  common  way  of  semantically  enrich¬ 
ing  data  on  the  Semantic  Web  is  by  providing  an  ontology  in  the  Web  Ontology 
Language  (OWL)  [3]  that  describes  the  data.  OWL  is  based  on  Description  Logic 
(DL)  [4]  which  has  the  desirable  property  of  decidability.  However,  this  character¬ 
istic  significantly  limits  expressivity,  and  so  rule-based  inference  has  been  turned  to 
as  an  alternative.  This  is  particularly  evidenced  by  the  recent  World  Wide  Web 
Consortium  (W3C)  Recommendation  of  the  Rule  Interchange  Format  (RIF)  [5]. 

The  topic  of  rule-based  inference  has  a  long  research  history  rooted  in  artificial 
intelligence  (particularly  logic  programming  and  theorem  proving)  and  databases. 
Rules  can  be  loosely  thought  of  as  If-Then  statements  in  which  some  condition 
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implies  some  consequence.  Rules  can  be  roughly  categorized  into  logic  rules  and 
production  rules  (or  more  generally,  rules  with  actions).  This  proposal  focuses  on 
production  rules,  which  are  capable  of  expressing  some  other  forms  of  rules  like 
Datalog  [6]. 

The  general  approach  to  scaling  inference  to  larger  amounts  of  data  has  been  to 
employ  parallelism.  Scalability  of  a  system  is  determined  based  on  the  kind  of  scaling 
under  consideration,  traditionally  strong  scaling  in  which  execution  time  should 
decrease  as  number  of  processors  increases,  holding  workload  fixed.  In  strong  scaling, 
the  ideal1  case  is  that  XV  =  ^  where  Ti  is  the  execution  time  on  a  single  processor, 
N  is  a  positive  number  of  processors,  and  Xv  is  the  execution  time  on  N  processors. 
A  more  appropriate  notion  of  scalability  for  the  Semantic  Web  is  data  scaling  [7] 
in  which  the  ratio  of  dataset  size  to  number  of  processors  is  fixed,  and  execution 
time  is  desired  to  remain  constant.  In  this  case  X/v  =  X  is  the  ideal  case,  where 
Xl  is  the  execution  time  on  a  single  processor  at  capacity,  and  X/v  is  the  execution 
time  on  N  processors  at  capacity.  The  reason  for  this  alternative  perspective  on 
scalability  is  that  handling  the  growth  of  data  is  the  primary  challenge,  and  reducing 
execution  time  is  secondary.  Either  way,  let  tn  denote  the  execution  time  on  N 
processors,  and  let  r*  be  its  ideal  value.  In  this  way,  the  following  arguments  can  be 
made  independent  of  the  notion  of  scalability,  >  r*  is  an  indication  that  there 
is  some  performance  cost  to  parallelizing  the  computation.  That  is,  in  practice, 
Tn  =  r*  +  cost(N). 

Traditional  research  in  parallel,  rule-based  inference  (mostly  from  the  late 
1980s  to  mid-1990s)  focused  on  the  general  case,  making  no  assumptions  about  the 
rules  or  data  [8,  9,  10,  11,  12,  13,  14,  15,  16].  Handling  large  amounts  of  data  was 
considered  a  problem  of  providing  correct  inference  while  minimizing  cost(N)  by 
improving  load-balancing  and  reducing  communication.  Some  approaches  required 
elimination  of  redundant  work  [12,  13]. 

However,  compared  to  the  focus  of  traditional  research,  the  Semantic  Web  is  a 
relatively  special  case  in  which  the  amount  of  data  far  exceeds  the  number  of  rules, 

Hdeal  in  a  theoretical  sense.  In  practice,  it  is  possible  for  TN  to  be  smaller  if  superlinear 
speedup  is  achieved  by  spreading  out  the  data  enough  so  that  each  processor  can  better  utilize 
cache. 
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and  even  without  consideration  of  data,  the  number  of  rules  can  be  relatively  few 
as  in  the  RDF  Schema  (RDFS)  [17]  rules.  This  change  in  focus  is  expressed  in  the 
cleverly  coined  hypothesis  of  the  Semantic  Web  originating  with  co-inventor  James 
Hendler:  “a  little  semantics  goes  a  long  way”  [18].  In  the  context  of  this  work,  the 
“little  semantics”  is  the  few  rules,  and  the  “long  way”  is  the  amount  of  data. 

As  a  result,  some  recent  works  in  parallel,  rule-based  inference  on  the  Se¬ 
mantic  Web  have  taken  a  different  perspective  of  parallelism  than  in  traditional 
research  on  parallel,  rule-based  inference.  In  the  presence  of  few  rules  and  relatively 
small  ontologies,  good  scalability  can  sometimes  be  achieved  simply  by  replicating 
portions  of  the  data  to  all  processors  and  allowing  them  to  perform  inference  in¬ 
dependently,  that  is,  in  an  embarrassingly  parallel  fashion  [19,  20,  21],  Therefore, 
except  for  possible  contention  during  the  initial  data  distribution,  cost(N)  reflects 
only  data-dependent  costs  like  load  imbalance  caused  by  skew  in  data  distribution, 
or  redundant  work  caused  by  replication  of  (initial  or  inferred)  data.  Therefore,  as¬ 
suming  each  processor  runs  the  best  sequential  inference  system,  cost(N)  is  purely 
determined  by  data  distribution.  Thus,  determination  of  a  distribution  scheme  is 
important  and  leads  to  the  first  contribution. 

The  first,  novel  contribution  of  this  thesis  is  to  determine  suf¬ 
ficient  conditions  under  which  a  data  distribution  scheme  sup¬ 
ports  correct  (embarrassingly)  parallel  inference  for  a  given  set 
of  rules,  independent  of  features  of  the  data. 

Previous  works  on  rule-based  inference  on  the  Semantic  Web  have  typically 
placed  restrictions  on  the  data  [19,  20,  21]  in  order  to  achieve  complete  parallel 
inference  with  some  fixed  set  of  rules.  This  work  instead  makes  no  assumptions 
about  the  data. 

In  many  interesting  cases,  though,  the  only  solution  is  to  replicate  all  the  data 
to  all  processors,  which  defeats  the  purpose  of  parallelization  (which  is  to  either  grow 
the  dataset  beyond  the  capacity  of  a  single  machine  and/or  improve  performance). 
In  such  cases,  one  might  consider  sacrificing  some  semantics  of  the  program  (rules) 
to  achieve  better  parallelization.  This  leads  to  the  next  contribution. 
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The  second,  novel  contribution  of  this  thesis  is  a  method  —  par¬ 
tially  formal  and  partially  heuristic  —  that  can  be  used  to  re¬ 
strict  a  set  of  rules  such  that  the  restricted  version  is  amenable 
to  parallelization. 

This  second  contribution  is  achieved  by  considering  a  special  case  of  data 
distribution  which  allows  the  conditions  for  parallelization  to  be  reduced  to  the 
satisfiability  problem  (SAT).  Combined  with  an  intuition-guided  methodology  and 
a  few  optimization  heuristics,  this  approach  is  used  to  derive  restricted  versions  of 
the  RDFS  and  OWL  2  Rule  Language  (OWL2RL)  [3]  rules  that  are  amenable  to 
parallelization. 

The  third,  novel  contribution  of  this  thesis  is  a  performance 
evaluation  of  parallel  inference  using  restricted  RDFS  and  OWL2RL 
rule  sets  to  empirically  test  the  practical  value  of  the  aforemen¬ 
tioned  theoretical  findings. 

The  evaluation  is  run  on  a  large  symmetric  multiprocessing  (SMP)  machine 
and  a  Blue  Gene/Q  [22],  The  datasets  used  in  the  evaluation  are  the  synthetic 
Lehigh  University  Benchmark  (LUBM)  [23]  dataset  (unrealistically  optimistic,  best 
case  scenario)  and  the  2012  Billion  Triples  Challenge  dataset  (BTC2012,  an  abysmally 
difficult,  worst-case  scenario)  [24],  Performance  metrics  include  execution  time,  rel¬ 
ative  speedup,  efficiency,  the  Karp-Flatt  metric  [25],  and  the  recently  proposed 
growth  efficiency  [7].  It  is  found  that  a  high  degree  of  scalability  is  achievable  with 
these  restricted  rule  sets  when  there  is  sufficient  memory  and  load-balancing. 


CHAPTER  2 

HISTORICAL  REVIEW 

There  are  three  main  categories  for  historical  review:  distributed  logic  programs, 
parallel  production  rule  systems,  and  recent  work  in  scalable,  rule-based  inference 
on  the  Semantic  Web.  Before  discussing  related  works  in  these  particular  fields, 
though,  the  taxonomy  of  parallel  strategies  of  deduction  given  by  Bonacina  [26]  is 
discussed  to  give  context  to  the  strategies  employed  in  previous  work.  Kotoulas  et 
al.  [27]  similarly  cover  the  discussion  from  section  2.1  and  the  review  of  related 
works  in  section  2.4. 

2.1  Taxonomy  of  Parallel  Deduction 

Bonacina  [26]  describes  three  levels  of  parallelism  differing  in  granularity: 
term-level  parallelism  (fine-grained),  clause- level  parallelism  (mid-grained),  and 
search-level  parallelism  (course-grained).  Term-level  parallelism  seeks  to  parallelize 
frequent,  low-level  operations  such  as  term  rewriting  and  unification.  For  exam¬ 
ple,  unifying  terms  p(a\, . . . ,  am)  and  q(b\, . . . ,  bn)  consists  of  checking  to  make  sure 
p  —  q,  m  —  n,  and  then  attempting  to  unify  each  a*  and  bi  for  1  <  i  <  n.  If  there 
exists  j,  k  such  that  j  ^  k  and  there  is  no  dependency  between  a,-  and  a*,,  and  there 
is  no  dependency  between  bj  and  bk,  then  a3  and  bj  can  be  unified  independently  of 
a,k  and  bk,  which  means  the  two  unifications  can  be  executed  in  parallel. 

Clause-level  parallelism  seeks  to  parallelize  individual  inference  steps.  This 
could  be  parallelizing  search  for  satisfactions  of  different  subgoals  in  a  rule,  or  it 
could  be  parallelizing  search  for  satisfactions  of  the  same  subgoal.  This  is  discussed 
more  in  section  2.2.1. 

Search-level  parallelism  seeks  to  divide  the  entire  search  space  among  proces¬ 
sors.  Search-level  parallelism  has  the  distinguishing  characteristic  that  its  coarse¬ 
grained  nature  lends  itself  to  distributed  environments.  However,  such  parallelism 
is  hardly  as  straightforward  as  the  previous  two.  The  manner  in  which  the  search 
space  is  divided  depends  on  the  goal  to  be  achieved  and  the  means  by  which  it  is  to 
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be  achieved.  There  are  two  axes  to  search-level  parallelism:  whether  the  process  is 
homogeneous  or  heterogeneous,  and  whether  multi-search  or  distributed  search  (or 
both)  are  used. 

In  homogeneous,  search-level  parallelism,  all  processors  use  the  same  inference 
system.  For  example,  they  all  use  a  DL  reasoner,  or  they  all  use  a  Prolog  engine. 
In  contrast,  heterogeneous  systems  combine  different  inference  systems. 

In  multi-search,  search-level  parallelism,  processors  may  employ  different  search 
plans.  In  other  words,  even  if  each  processor  has  the  same  inference  system  (e.g.,  a 
DL  reasoner),  each  processor  may  apply  rewrite  rules  in  different  orders  or  to  differ¬ 
ent  formulas.  The  main  point  is  that  the  processors  may  handle  non-determinism 
differently.  On  the  other  hand,  distributed  search  has  nothing  to  do  with  non¬ 
determinism  (although  it  does  not  preclude  multi-search).  The  defining  character¬ 
istic  of  distributed  search  is  that  processors  differ  in  distribution  of  the  problem,  for 
example,  by  data  and/or  rule  distribution.  Distributed  search  often  requires  the  pro¬ 
cessors  to  communicate  in  order  to  ensure  completeness  or  improve  load-balancing. 

Thus,  search-level  parallelism  can  differ  by  diversity  of  inference  systems, 
search  plans,  and/or  problem  distribution.  Since  it  allows  for  distribution  of  data, 
search-level  parallelism  is  the  form  of  parallelism  focused  on  herein,  specifically  ho¬ 
mogenous,  non-multi-search,  distributed  search.  In  other  words,  I  am  seeking  to 
achieve  parallelism  solely  by  means  of  data  distribution.  Each  processor  is  assumed 
to  have  the  same  inference  system  and  the  same  search  plan  (i.e.,  same  way  of 
handling  non-determinism) . 

2.2  Distributed  Logic  Programs 

Parallelism  for  improving  the  performance  of  reasoning  with  logic  programs 
has  been  well-studied  as  shown  in  the  survey  papers  [10,  26].  This  section  considers 
the  most  relevant  logic  programming,  namely  Prolog  [10]  and  Datalog  [6]. 

2.2.1  Prolog 

Strategies  for  parallel,  backward-chaining  inference  in  Prolog  have  been  sur¬ 
veyed  by  Gupta  et  al.  [10].  Prolog  rules  have  the  form  H  :  -  Bi, . . . ,  Bn,  where  H 
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and  each  Bt  are  atomic  formulas.  H  is  called  the  head  or  consequent,  and  Bi, . . . ,  Bn 
is  called  the  body  or  antecedent.  All  variables  arc  implicitly  universally  quantified  to 
the  scope  of  the  rule.  Gupta  et  al.  delineate  three  forms  of  parallelism:  unification 
parallelism,  and-parallelism,  and  or-parallelism. 

Unification  parallelism  is  a  form  of  term-level  parallelism,  an  example  of  which 
has  already  been  given  in  section  2.1.  And-parallelism  is  concerned  with  satisfying 
the  subgoals  (Bj)  in  the  body  of  a  rule  in  parallel.  Doing  so  is  not  necessarily 
straightforward  since  subgoals  in  the  body  often  share  variables,  and  thus,  they 
cannot  be  satisfied  independently.  In  contrast,  or-parallelism  certainly  allows  for 
independent  processing.  If  a  (sub)goal  can  be  unified  to  the  heads  of  multiple  rules, 
then  search  for  satisfactions  of  these  rules  can  be  performed  independently  for  each 
rule.  If  the  goal  is  the  query  being  posed  to  the  Prolog  engine,  then  or-parallelism 
in  this  case  is  search-level  parallelism.  If  the  goal  is  a  subgoal  of  a  rule,  then  or- 
parallelism  in  this  case  is  clause-level  parallelism. 

Note  that  of  these  approaches,  none  of  them  use  a  distributed  search  strategy. 
Thus,  these  forms  of  parallelism  are  orthogonal  to  the  focus  of  this  work. 

2.2.2  Datalog 

Unlike  Prolog,  Datalog  has  a  significant  history  of  distributed  search.  Relevant 
works  and  strategies  have  been  summarized  in  [11],  The  main  approaches  have  been 
program  restriction  and  predicate  decomposability. 

The  goal  of  program  restriction  is  to  effectively  distribute  the  firing  of  rule  in¬ 
stances  (i.e.,  the  drawing  of  individual  inferences  from  specific  rules)  to  processors  by 
appending  some  conditions  to  the  rule  bodies.  Such  an  approach  usually  uses  some 
hash-distribution  scheme  for  assigning  facts  to  processors,  effectively  implementing 
a  distributed  hash-join.  Inferences  then  need  to  be  communicated  between  proces¬ 
sors  to  ensure  completeness;  in  other  words,  inferences  are  also  hash- distributed. 
Attempts  to  optimize  restricted  programs  is  by  way  of  minimizing  communication. 
Restricted  programs  are  an  example  of  distributed  search.  Unlike  embarrassingly 
parallel  inference,  program  restriction  simply  assumes  communication  between  pro¬ 
cessors  (although  it  does  not  preclude  the  possibility  of  restricting  a  program  such 


that  no  communication  is  necessary). 

In  contrast,  predicate  decomposability  as  introduced  by  Wolfson  and  Silber- 
schatz  [13]  aims  to  distribute  data  to  processors  such  that  embarrassingly  parallel 
inference  for  a  given  predicate  is  correct  and  each  processor’s  closure  for  that  predi¬ 
cate  is  disjoint  from  every  other  processor’s  closure.  (Closure  is  the  set  of  all  possible 
inferences  for  the  given  facts  and  rules,  which  is  always  derivable  in  Datalog).  Thus, 
it  is  also  an  example  of  distributed  search.  In  [13],  Wolfson  and  Silberschatz  char¬ 
acterized  three  types  of  single  rule  programs  (sirups)  for  which  predicate  decompos¬ 
ability  is  achievable.  Later,  Wolfson  and  Ozeri  [12]  added  an  additional  criterion, 
that  each  processor  must  have  at  least  one  fact  with  the  predicate  being  decom¬ 
posed.  In  the  same  paper,  they  present  a  theorem  with  sketch  proof  stating  that  it 
is  undecidable  whether  a  program  is  decomposable  on  a  given  predicate. 

In  contrast  to  predicate  decomposability,  the  work  proposed  herein  seeks  only 
to  achieve  soundness  (implied  in  Datalog)  and  completeness;  there  is  no  strict  re¬ 
quirement  of  disjointness  or  whether  each  processor  has  at  least  one  fact  for  some 
predicate.  The  logical  desirability  behind  the  disjointness  criterion  is  that  if  each 
inference  is  drawn  by  exactly  one  processor,  then  no  redundant  work  is  performed, 
and  assuming  perfect  load-balancing,  a  high  parallel  efficiency  can  be  achieved.  This 
is  a  very  strict  requirement,  one  that  surely  cannot  be  met  in  many  cases. 

2.3  Parallel  Production  Rule  Systems 

LTnlike  with  (typical  recursive)  Datalog,  Production  Rule  Systems  (PRSs)  al¬ 
low  functions,  negation  of  formulas,  and  retraction  of  facts.2  Therefore,  inference 
in  a  PRS  may  not  be  decidable,  and  the  order  in  which  rule  instances  (rules  with 
bound  variables)  are  fired  may  have  an  impact  on  which  facts  are  inferred  and  when. 
A  mechanism  called  a  conflict  resolution  strategy  (CRS)  is  used  to  define  how  rule 
instances  are  fired. 

PRSs  follow  a  cycle  of  match,  resolve,  and  fire.  First,  rules  are  matched  to  the 
knowledge  base  (factbase,  set  of  facts)  to  derive  all  possible  rule  instances,  then  the 

2Variants  of  Datalog  also  allow  forms  of  negation,  but  to  the  best  of  my  knowledge,  they  have 
been  generally  unstudied  in  the  context  of  parallelism. 
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CRS  determines  which  rule  instances  should  be  fired  and  in  what  order,  and  then 
the  selected  rules  are  fired.  Most  work  in  parallel  PRSs  center  around  parallelizing 
the  OPS5  PRS  [28]  which  used  the  well-known  RETE  algorithm  [29]  for  the  match 
phase  [8].  Much  of  the  work  in  parallelizing  OPS5  has  focused  on  the  match  phase 
since  it  accounted  for  the  majority  of  execution  time  [9].  Other  work  has  focused  on 
parallelizing  rule  firing,  a  much  more  difficult  problem.  If  resolution  and  rule  firing 
can  be  parallelized,  then  the  entire  inference  cycle  can  be  parallelized. 

Works  on  parallelizing  the  match  phase  have  focused  on  parallelizing  the  RETE 
algorithm  on  different  architectures.  The  main  advantage  of  RETE  was  that  it  could 
quickly  match  many  rules  (“productions”),  but  this  advantage  is  of  less  importance 
in  the  present  landscape  given  the  change  of  focus  from  many  rules  to  much  data 
[8].  Parallelizing  the  match  phase  with  few  rules  is  well  understood,  particularly 
given  the  vast  research  on  parallelizing  relational  queries  (to  which  rule  conditions 
can  typically  be  reduced). 

This  thesis  focuses  on  parallelizing  the  entire  inference  cycle;  therefore,  it  is 
more  important  to  consider  previous  work  toward  that  end.  The  difficulty  in  par¬ 
allelizing  the  inference  cycle  is  in  simultaneous  bring  of  rules  such  that  the  effect  is 
somehow  similar  to  firing  the  rules  under  a  sequential  CRS.  In  OPS5,  in  each  infer¬ 
ence  cycle,  only  a  single  rule  instance  can  be  selected  for  firing  [28],  and  therefore, 
there  is  a  deterministic  order  in  which  individual  rule  instances  are  fired.  Given  this 
context,  Schmolze  proposed  the  serializability  criterion  for  correctness  of  parallel 
rule  firing: 

“We  say  that  the  coexecution  of  many  instantiations  is  serializable  if  and 
only  if  there  exists  some  serial  execution  that  would  produce  the  same 
result  using  the  same  instantiations.”  [14] 

In  other  words,  under  the  serializability  criterion,  the  result  of  firing  multiple 
rule  instances  in  parallel  must  be  the  same  as  bring  the  same  instances  in  some 
valid  order  under  the  sequential  CRS.  Schmolze  argues  that  ensuring  serializability 
is  important  because  developers  rely  on  the  CRS  to  understand  and  enforce  the 
semantics  of  their  programs  (rules). 
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To  satisfy  the  serializability  criterion,  Kuo  and  Moldovan  [9]  define  two  prob¬ 
lems  that,  when  solved,  ensure  serializability. 

•  The  compatibility  problem  is  the  problem  of  firing  incompatible  rule  instances. 
Conditions  are  given  by  Kuo  and  Moldovan  for  when  rule  instances  are  con¬ 
sidered  compatible.  One  is  that  the  firing  of  one  rule  instance  before  another 
should  not  preclude  the  firing  of  the  latter  rule  instance  (e.g.,  one  rule  instance 
retracts  a  fact  that  is  used  to  match  the  condition  of  the  other  rule  instance). 
Another  is  that  the  actions  of  two  rule  instances  should  not  be  contradictory 
(e.g.,  one  rule  instance  asserts  a  fact  that  the  other  retracts). 

•  The  convergence  problem  is  the  problem  of  ordering  the  individual,  parallel  rule 
brings  such  that  they  are  serializable.  Ensuring  compatibility  simply  means 
that  a  single,  simultaneous  firing  of  (compatible)  rule  instances  will  have  a 
valid  serialization.  However,  it  does  not  guarantee  that  the  ordering  of  such 
simultaneous  brings  will  have  an  overall  valid  serialization.  In  other  words, 
arbitrarily  chaining  together  two  valid  serializations  does  not  necessarily  create 
a  single  valid  serialization.  This  is  the  convergence  problem. 

The  compatibility  problem  makes  particular  sense  for  the  OPS5  PRS  which 
requires  that  only  a  single  rule  instance  be  selected  for  bring  in  any  given  cycle. 
Thus,  the  bring  of  one  rule  instance  can  indeed  cause  another  (unbred)  rule  instance 
to  be  unmatched  in  subsequent  inference  cycles.  That  is,  it  is  actually  possible  for 
the  bring  of  a  rule  instance  to  preclude  the  bring  of  another  rule  instance.  If  the 
requirement  that  only  a  single  rule  instance  be  bred  per  cycle  is  relaxed,  then  the 
compatibility  problem  becomes  less  of  an  issue.  This  will  be  further  addressed  in 
the  next  chapter. 

Concerning  the  convergence  problem,  one  possible  solution  observed  by  Kuo 
and  Moldovan  [9]  is  to  simply  replace  the  semantics  of  a  sequential  CRS  with  that 
of  a  parallel  CRS.  The  benehts  of  such  an  approach  is  that  a  parallel  CRS  can  still 
provide  some  semantics  on  which  a  developer  may  rely,  argued  to  be  essential  by 
Schmolze  [14].  However,  the  burden  of  dehning  the  semantics  of  the  parallel  CRS 
has  commonly  been  relegated  to  developers  to  dehne  so-called  “metarules”  that 


11 


control  the  parallel  execution  of  the  PRS  [8]. 

Another  approach  to  solving  the  convergence  problem  (as  observed  by  Kuo  and 
Moldovan  [9])  is  to  introduce  non-determinism.  If  the  order  of  firing  rule  instances 
(or  some  set  of  rule  instances)  is  completely  non-deterministic,  then  any  ordering 
satisfies  the  serializability  criterion. 

This  characteristic  -  that  any  ordering  produces  correct  results  -  is  actually 
a  strong  criterion  called  the  commutativity  criterion  proposed  by  Ishida  and  Stolfa 
[15].  In  the  case  of  a  completely  non-deterministic  CRS,  ordering  of  rule  instances 
can  be  arbitrary,  and  thus  any  parallel  execution  can  be  considered  correct.  However, 
outside  of  such  cases,  the  developer  must  prove  to  him/herself  that  any  ordering  of 
rule  instances  would  produce  “correct”  results.  As  pointed  out  by  Amaral  and 
Ghosh  [16],  the  commutativity  criterion  is  generally  considered  too  strict. 

Returning  to  the  serializability  criterion  and  non-deterministic  CRSs,  there 
have  been  multiple  approaches  to  introducing  non-determinism,  but  they  can  be 
roughly  classified  into  two  cases:  providing  mechanisms  to  the  developer  to  effec¬ 
tively  direct  non- determinism  (i.e.,  introduce  some  determinism),  and  to  divide  rules 
into  sequential  and  parallel  sets.  Sequential  sets  are  always  executed  sequentially, 
but  parallel  sets  are  executed  either  non-deterministically  (arbitrary  ordering)  or  si¬ 
multaneously  (similar  to  inflationary  semantics  in  Datalog  with  negation  [6]).  These 
rulesets  are  determined  by  developers,  and  in  some  cases,  there  is  a  control  flow  be¬ 
tween  active  rulesets  (called  “contexts”)  that  provides  some  determinism. 

These  approaches  may  have  seemed  practical  when  developers  were  the  only 
users  under  consideration.  However,  given  the  broad  audience  of  the  (Semantic) 
Web,  it  seems  unduly  onerous  to  require  relatively  lay  people  to  understand  so 
many  fine  details.  The  most  practical  approach  seems  to  be  the  formulation  of  a 
parallel  CRS.  A  class  of  CRSs  amenable  to  parallelization  is  defined  in  chapter  3. 

2.4  Scalable,  Rule-based  Inference  on  the  Semantic  Web 

Related  works  in  this  area  are  relatively  recent.  Perhaps  one  of  the  earliest 
works,  from  2006,  is  toward  partitioning  OWL  knowledge  bases  for  distributed  rea¬ 
soning  by  Guo  and  Heflin  [30],  although  it  is  focused  on  DLs  rather  than  rules. 
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Liebig  and  Muller  [31]  as  well  as  Bock  [32]  considered  similar  problems. 

Regarding  parallel  rule  inference,  though,  the  earliest  works  specific  to  the 
Semantic  Web  seem  to  go  back  to  2008,  focusing  exclusively  on  distributed  mem¬ 
ory  architectures  with  the  exceptions  of  [33,  34],  Soma  and  Prasanna  considered 
rule  partitioning  and  data  partitioning  approaches  to  parallel  OWL  Horst  inference 
[35].  OWL  Horst  is  an  established,  de  facto  standard  set  of  rules  based  on  the  pD* 
fragment  [36,  37]  of  OWL  1  [38].  Kaoudi  et  al.  presented  work  on  RDFS  infer¬ 
ence  on  distributed  hash  tables  (DHTs)  [39].  At  the  2008  Billion  Triples  Challenge, 
Anadiotis  et  al.  presented  MaRVIN,  a  system  for  sound  and  eventually  complete 
RDFS  inference  on  clusters  that  won  third  place  [40],  which  also  lead  to  subsequent 
research  in  2009  by  Oren  et  al.  [41,  42],  MaRVIN  was  different  from  previous  (and 
even  subsequent)  approaches  in  that  it  did  not  necessarily  guarantee  completeness, 
although  evaluation  suggested  that  it  would  at  least  come  very  close,  particularly 
when  using  suitable  heuristics.  Another  work  distinct  from  its  peers  is  the  work 
on  approximate  reasoning  by  Rudolph  et  al.  [34]  in  which  multiple  inference  sys¬ 
tems  were  combined  not  to  decrease  latency  but  to  increase  availability  and  quality 
of  answers  for  any  given  amount  of  execution  time.  This  is  an  example  of  non- 
distributed,  multi-search,  as  opposed  to  all  other  works  presented  in  this  section, 
which  are  distributed,  non-multi-search. 

In  2009,  Hendler  and  I  [21]  demonstrated  that  certain  characteristics  of  the 
RDFS  rules  allow  RDFS  inference  to  be  performed  in  an  embarrassingly  parallel 
fashion  and  still  achieve  soundness  and  completeness  if  the  data  met  some  common 
conditions  (e.g.,  the  rdf  s :  subPropertyOf  property  has  no  subproperties),  the  re¬ 
sults  of  which  were  used  in  the  system  that  was  part  of  the  champion  submission  to 
the  2009  Billion  Triples  Challenge  by  Williams  et  al.  [43,  44].  At  the  same  confer¬ 
ence,  Urbani  et  al.  presented  similar  findings  regarding  parallel  RDFS  inference  in 
a  MapReduce  framework  [20].  Quite  recently,  Patel-Schneider  [45]  expanded  on  the 
details  of  the  assumed  conditions  by  Hendler  and  me,  and  by  LIrbani  et  ah,  showing 
that  producing  the  truly  complete  finite  RDFS  closure  (no  assumed  conditions  on 
the  data)  is  inherently  serial,  where  here,  “serial”  is  meant  in  the  theoretical  sense, 
that  is,  not  being  completely  parallel. 
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In  2010,  Urbani  et  al.  extended  their  work  to  OWL  Horst  inference  [46], 
achieving  the  largest  closures  on  both  real-world  (on  the  order  of  billions  of  triples) 
and  synthetically  generated  datasets  (100  billion  triples)  to  date.  Kotoulas  et  ah 
utilized  previous  MaRVIN  work  to  address  the  issue  of  load-balancing  in  parallel 
RDFS  inference  caused  by  data  skew  [47].  Hogan  et  al.  presented  optimizations 
for  distributed,  rule-based  reasoning  using  a  subset  of  the  OWL2RL  rules  [19]. 
This  work  by  Hogan  et  al.  is  particularly  interesting  for  two  reasons.  First,  the 
inference  system  disallows  inferences  resulting  from  “ontology  hijacking”  [48],  an 
apparent  misuse  of  ontologies.  Second,  this  work  mathematically  derives  conditions 
for  soundness  and  completeness  for  rule-based  reasoning  (specifically  Datalog  rules 
with  focus  on  linear  recursion)  with  ontologies.  Goodman  and  Mizell  applied  the 
findings  of  [20,  21]  to  implement  and  optimize  RDFS  inference  on  a  Cray  XMT  [33], 
the  first  work  to  make  significant  use  of  a  shared  memory  architecture.  This  system 
also  contributed  to  the  runner  up  submission  to  the  2010  Billion  Triples  Challenge 
[49], 

All  of  these  previous  works  on  rule-based  inference  on  the  Semantic  Web  had 
been  forward  chaining  reasoners,  with  the  exception  of  [39].  In  2011,  Urbani  et  al. 
presented  a  parallel  system  for  backward-chained  inference  with  RDFS  and  OWL 
Horst  rules  [50].  The  collective  works  of  Urbani  to  date  can  be  found  in  his  recently 
defended  doctoral  dissertation  [51]. 

In  2012,  Heino  and  Pan  demonstrated  parallel  RDFS  inference  on  graphics 
processing  units  (GPUs)  [52], 


CHAPTER  3 

SUFFICIENT  CONDITIONS  FOR  PARALLEL 

INFERENCE 

In  2009,  Hendler  and  I  showed  that  a  restricted  version  of  the  RDFS  closure  (the 
full  closure  of  which  is  known  to  be  undecidable  [37,  45,  53])  can  be  produced  in  an 
embarrassingly  parallel  fashion,  allowing  a  high  degree  of  scalability  to  be  achieved 
by  increasing  the  number  of  processors  with  the  amount  of  data  [21],  The  formal 
basis  for  this  finding  was  based  on  inspection  of  the  RDFS  rules  and  the  common 
assumption  that  the  terminological  (then  referred  to  as  “ontological” )  triples  consti¬ 
tute  a  small  part  of  the  overall  data.  In  this  chapter,  I  further  develop  these  notions 
beyond  specific  application  to  RDFS  toward  general  application  to  production  rules. 

First,  it  must  be  defined  exactly  what  is  meant  by  rules.  In  section  3.1, 
the  notion  of  rules  is  defined  as  a  subset  of  the  Production  Rule  Dialect  of  RIF 
(RIF-PRD)  [54]  with  equivalent  operational  semantics.  Then  in  section  3.2,  formal 
definitions  are  given  for  parallel  inference,  including  various  definitions  for  what  it 
means  for  parallel  inference  to  be  “correct”.  In  section  3.3,  sufficient  conditions  are 
proven  for  correctness  of  parallel  inference,  which  is  the  main  contribution  of  this 
chapter. 

3.1  Rules  and  their  Semantics 

In  this  section,  the  syntax  for  rules  and  the  operational  semantics  of  inference 
are  defined.  The  basis  for  these  definitions  are  RIF-PRD  [54],  and  as  such,  much  of 
the  content  of  this  section  is  directly  derived  from  that  work.  However,  the  defini¬ 
tions  are  not  exactly  the  same.  Not  only  have  they  been  reworded  and  reorganized 
to  fit  the  language  and  purpose  of  this  work,  but  also  the  definitions  herein  are 
more  restrictive.  Some  of  the  restrictions  are  merely  syntactic  (e.g.,  forcing  normal 
form)  while  others  actually  reduce  the  expressive  power  of  the  rule  language.  Foot¬ 
notes  are  provided  with  information  regarding  the  differences,  although  they  are  not 
necessarily  an  exhaustive  accounting  of  the  differences. 
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Additionally,  mathematical  notation  will  periodically  be  defined  in  distinct 
text  blocks  to  make  clear  the  express  intent  to  use  a  particular  notation  throughout 
the  remainder  of  this  work. 

3.1.1  Syntax  and  Notation 

Definition  1.  Reserved  symbols  are  any  of  the  following:  =,  #,  ##,  ->,  [,  ] ,  (,  ), 

List,  If,  Then,  Do,  Assert,  Retract,  And,  Not,  External. 

Definition  2.  A  variable  is  a  symbol  that  is  not  a  reserved  symbol  and  is  represented 
syntactically  as  a  string  in  typewriter  text  that  does  not  contain  whitespace  and  that 
begins  with  a  question  mark.  For  example,  ?v. 

Definition  3.  A  constant  is  a  symbol  that  is  not  reserved  and  not  a  variable  rep¬ 
resented  syntactically  as  "lex"~~<datatype>  where  lex  is  an  appropriately  escaped 
string  called  the  lexical  representation  and  datatype  is  an  Internationalized  Resource 
Identifier  (IRI)  [55]  identifying  how  lex  should  be  interpreted.  Constants  are  divided 
into  four  disjoint  sets:  (plain)  predicate  names,  built-in  (predicate)  names,  function 
names,  and  individual  names.  Each  non-individual  name  has  an  associated  non¬ 
negative  integer  referred  to  as  its  arity.  For  a  non- individual  name  p,  let  arity(p) 
denote  the  arity  of  p. 

Notation.  Constants  with  certain  datatypes  are  allowed  a  syntactic  shorthand,  as 
illustrated  by  example  in  the  following. 

•  " http  ://www.  rpi .  edu/"'''~<http  ://www.  w3.  org/2007/rif#iri>  can  equiv¬ 
alently  be  represented  <http  ://www.  rpi .  edu/>  or  in  appropriate  cases,  using 
a  Compact  URI  (CURIE)  [56]; 

•  "label" ~~<http :  / /vjvjuj  .  w3 .  org/2007/rif#local>  can  equivalently  be  rep¬ 
resented  _label; 

•  "  1"  ~~<http : //ujww ,uj3 .  org/2001/XMLSchema#integer>  can  equivalently  be 
represented  1. 

Definition  4.  Any  structure  is  said  to  be  ground  iff  it  contains  no  variables.  By 
definition,  constants  are  ground  and  variables  are  not  ground. 
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Definition  5.  A  term  is  recursively  defined  as  any  of  the  following: 

•  a  constant; 

•  a  variable; 

•  a  (finite)  list  of  ground  terms,  denoted  syntactically  as  List(fi  f2  ■  ■  ■  tn ) 
where  n  >  0; 

•  a  function  term  denoted  syntactically  as  f  (a±  a2  . . .  aarity(f))  where  /  is  a 
function  name  and  each  a*  for  1  <  i  <  arity(f)  is  a  term. 

Definition  6.  An  atomic  formula  is  any  of  the  following: 

•  an  atom  denoted  p(a,\  . . .  aarity(p)')  where  p  is  a  predicate  name  and  each  a* 
for  1  <  %  <  arity(p)  is  a  term; 

•  an  equality  formula  denoted  fi=f2  where  t\  and  f2  are  terms; 

•  a  membership  formula  denoted  o#c  where  o  is  a  term  called  the  object,  and  c 
is  a  term  called  the  class; 

•  a  subclass  formula  denoted  ci##c2  where  c\  is  a  term  called  the  subclass,  and 
c2  is  a  term  called  the  superclass; 

•  a  frame  denoted  o  [a->td  where  o  is  a  term  called  the  object,  a  is  a  term  called 
the  attribute,  and  v  is  a  term  called  the  value  ( a->v  is  referred  to  as  the  slot);3 

•  a  built-in  formula 4  denoted  External  (p(a±  . . .  aarity(p )) )  where  p  is  a  built-in 
predicate  name  and  each  a*  for  1  <  i  <  arity(p )  is  a  term. 

Definition  7.  A  fact  is  a  ground  atomic  formula. 

Definition  8.  An  independent  fact  is  a  fact  that  is  a  built-in  formula  or  an  equality 
formula  in  which  both  terms  are  identical. 

3Unlike  RIF-PRD,  I  restrict  frame  atomic  formulas  to  a  single  slot.  This  is  just  to  simplify  the 
syntax  and  does  not  reduce  expressivity  since  the  conjunction  of  multiple  frames  with  the  same 
object  And(o[ai->rii]  ...  o[a„->un])  is  effectively  the  same  as  a  single  frame  having  all  slots 
o\_ai->v\  ...  an->vn ]. 

4I  have  opted  to  refer  to  these  kinds  of  formulas  as  “built-in  formulas”  which  is  the  more 
traditional  terminology.  RIF-PRD  refers  to  these  as  “externally-defined  formulas.” 
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Definition  9.  A  dependent  fact  is  a  fact  that  is  not  an  independent  fact. 

The  fact  is  the  fundamental  unit  of  data.  The  distinction  between  independent 
and  dependent  facts  is  not  explicit  in  RIF-PRD.  However,  such  a  distinction  is  useful 
when  defining  the  operational  semantics.  As  will  be  made  clearer  in  section  3.1.2, 
an  independent  fact  is  a  fact  that  is  implicitly  assumed,  whereas  a  dependent  fact 
must  be  explicitly  asserted. 

Definition  10.  A  condition  (formula)  is  any  of  the  following5: 

•  an  atomic  formula; 

•  a  negated  formula  denoted  Not(/)  where  /  is  an  atomic  formula; 

•  a  conjunction  denoted  And(  fi  ...  fn )  where  n  >  0  and  each  ft  for  1  <  i  <  n 
is  either  an  atomic  formula  or  a  negated  formula. 

Notation.  For  a  formula  f ,  let  C+(f)  be  defined  as  follows: 

•  if  f  is  an  atomic  formula,  then  C+(f )  =  {/}; 

•  if  f  is  a  negated  formula,  then  C+(f)  =  0; 

•  if  f  is  a  conjunction  And(fi  . . .  fn),  then  C+(f)  =  [Jf=l  C+(fi). 

Notation.  For  a  formula  f,  let  C_’(/)  be  defined  as  follows: 

•  if  f  is  an  atomic  formula,  then  C_’(/)  =  0; 

•  if  f  is  a  negated  formula  Not(f'),  then  C_’(/)  =  {/'}; 

•  if  f  is  a  conjunction  And(f\  . . .  fn),  then  C_’(/)  =  (JILi 

Notation.  For  a  formula  f,  let  C(f)  =  C+(f)  U  C_’(/). 

5The  following  are  differences  with  RIF-PRD:  (1)  existential  formulas  are  excluded;  (2)  dis¬ 
junction  formulas  are  excluded;  (3)  the  subformula  of  a  negated  formula  must  be  atomic;  (4)  the 
subformulas  of  a  conjunction  must  be  either  atomic  or  negated  formulas.  The  only  apparent  re¬ 
duction  in  expressivity  is  the  absence  of  existential  formulas;  the  other  restrictions  simply  enforce 
that  rules  be  in  normal  form. 
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Definition  11.  An  action  is  one  of  the  following6: 

•  an  assertion  denoted  Assert  (/)  where  /  is  an  atom  or  frame; 

•  a  retraction  denoted  Retract  (/)  where  /  is  an  atom  or  frame. 

In  either  case,  /  is  referred  to  as  the  target  of  the  action. 

Notation.  For  an  action  a,  let  A+(a)  be  defined  as  follows: 

•  if  a  is  Assert  (f),  then  A+(a)  =  {/}; 

•  if  a  is  Retract  (f ) ,  then  A+(a)  =  0. 

Notation.  For  an  action  a,  let  A~"(a)  be  defined  as  follows: 

•  if  a  is  Assert  (f ) ,  then  A~"(a)  =  0; 

•  if  a  is  Retract  (f ),  then  A~"(a)  =  {f} . 

Notation.  For  an  action  a,  let  A(a)  =  *4.+  (a)  U  .4" (a). 

Definition  12.  An  action  block  is  a  sequence  of  actions  denoted  DoCcq  ...  an ) 
where  n  >  1  and  each  a,  for  1  <  i  <  n  is  an  action. ' 

Notation.  For  an  action  block  a  =  Do(a\  . . .  an),  define  the  following  notation: 

•  for  n  —  1,  *4.+  (a:)  =  *4.+  (ai)  and  AT  {a)  =  A~"{af); 

•  for  n  >  1, 

—  if  an  is  Assert  (f ),  then 

*  A+(a)  =  A+{Do(al  ...  an_i))U{f}, 

*  A~^{a)  =  A~'{Do(a1  ...  an_0)  \  {/}; 

—  if  an  is  Retract  (f ) ,  then 

6This  definition  is  significantly  more  restrictive  than  in  RIF-PRD.  Specifically,  the  ability  to 
retract  all  frames  with  a  specific  object  and  possibly  a  specific  attribute  is  excluded;  the  Execute 
action  is  excluded;  compound  actions  like  the  Modify  action  are  excluded;  and  assertion  of  mem¬ 
bership  formulas  (which  requires  action  variable  declarations)  is  excluded. 

7Unlike  RIF-PRD,  I  exclude  action  variable  declarations.  This  constitutes  a  loss  of  expressivity 
taken  to  avoid  the  complications  of  introducing  new  constants. 
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*  A+(a)  =  A+(Do(a1  ...  an-0)  \  {/}, 

*  A' (a)  =  A' (Do  (a  1  ...  an_0)  U  {/}; 

•  A(a)  =  A+(a)  U  A^(a). 

It  is  important  to  note  that  the  meaning  of  the  notation  A+(a )  and  A~"(a)  is 
carefully  defined  so  that  (if  a  is  ground)  A+(a)  does  not  include  asserted  facts  that 
are  retracted  by  subsequent  actions  and  A~"(a)  does  not  include  retracted  facts  that 
are  asserted  by  subsequent  actions.  This  will  help  to  avoid  discussion  of  trivial  edge 
cases  in  later  proofs. 

Definition  13.  A  rule  is  denoted  If  /  Then  a  where  /  is  a  condition  and  a  is  an 
action  block.8 

Notation.  For  a  rule  r  —  If  f  Then  a,  define  the  following  notation: 

•  C+(r)  =  C+(f),  C-(r)  =  C-(f),  and  C(r)  =  C(f); 

•  A+(r)  =  A+(a),  A~"(r)  =  A~"(a),  and  A(r)  =  A(a). 

Note  that  I  have  placed  no  significant  restrictions  on  what  constitutes  a  rule. 
For  example,  under  this  definition,  it  is  possible  that  C+(r)  =  0,  which  immediately 
raises  concern  for  the  reader  who  is  familiar  with  production  rules.  Granted,  this  is 
unusual,  but  there  is  no  reason  to  restrict  the  syntax  of  rules  here.  Rather,  I  will 
address  this  issue  in  definition  24. 

3.1.2  Operational  Semantics 

Whereas  the  previous  section  defined  the  syntax  of  rules  (i.e. ,  what  rules  look 

like),  this  section  provides  semantics  for  how  to  derive  inferences  and  take  action 

based  on  the  rules.  It  should  be  understood,  though,  that  an  operational  semantics 

differs  from  a  declarative  semantics.  In  a  declarative  semantics,  terms,  propositions, 

rules,  and  other  structures  are  first  given  meaning,  and  then  operations  using  those 

structures  are  derived  which  must  work  within  the  confines  of  those  semantics.  The 

advantage  of  such  semantics  is  that  meaning  is  objectively  established.  On  the 

8Unlike  in  RIF-PRD  which  uses  the  Forall  keyword  to  indicate  universal  quantification  of 
variables,  herein,  all  variables  in  a  rule  are  implicitly  universally  quantified  and  scoped  to  the  rule. 
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other  hand,  an  operational  semantics  gives  meaning  to  the  structures  based  solely 
on  operations  over  the  structures,  and  so  the  operations  are  the  semantics.  The 
advantage  here  is  that  it  is  arguably  easier  (for  those  who  are  not  trained  logicians) 
to  understand  the  implications  of  the  rules  and  data,  even  if  they  are  not  as  well- 
rooted  in  a  purely  logic-based  framework.  Regardless,  rules  based  on  operational 
semantics  are  still  powerful  and  practical,  and  for  some  forms  of  rules  (e.g.,  Datalog), 
the  declarative  and  operational  semantics  are  effectively  the  same. 

The  operational  semantics  in  this  section  are  ultimately  defined  in  a  signif¬ 
icantly  different  way  than  in  R1F-PRD,  although  they  are  equivalent.  Here  the 
semantics  are  given  algorithmically  rather  than  as  a  labeled  terminal  transition  sys¬ 
tem.  The  reformulation  in  this  section  is  meant  to  simplify  later  theorems  and 
proofs. 

Definition  14.  A  factset 9  is  a  set  of  facts  F  =  Fj  U  Fd  such  that: 

•  Fj  is  a  (possibly  infinite)  set  of  independent  facts  such  that  /  £  Fj  iff  one  of 
the  following  holds, 

—  /  is  a  built-in  formula  that  evaluates  to  true  according  to  the  semantics 
of  its  predicate, 

—  /  is  an  equality  formula  in  which  both  sides  are  identical; 

•  Fd  is  a  finite  set  of  dependent  facts  satisfying  the  following  conditions: 

—  if  Ci##C2  G  Fd  and  C2##C3  E  Fd-,  then  c\##c3  E  Fd ; 

—  if  o#c i  G  Fd  and  Ci##C2  G  Fd,  then  o##C2  G  Fd', 

—  if  ti=t2  E  Fd  and  t2=t3  E  FD,  then  ti=t3  E  FD ; 

—  if  t\—t2  E  Fd,  then  t2=t\  E  FD. 

A  factset  is  the  form  of  dataset  for  inference  as  discussed  herein.  The  condi¬ 
tions  in  definition  14  merely  ensure  coherence  of  the  factset  (e.g.,  equality  should 

°RIF-PRD  uses  a  similar  notion  referred  to  as  a  “State  of  the  Fact  Base”  which  effectively 
corresponds  to  Fp  in  the  definition  of  factset.  However,  keeping  independent  facts  “outside”  of 
the  factset  causes  pain  in  later  proofs.  It  is  easier  and  just  as  valid  -  or  so  I  claim  -  to  consider 
independent  facts  as  being  implicitly  present  in  every  factset. 
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be  symmetric  and  transitive).  These  are  nuances  which  need  to  be  explicitly  stated 
for  completeness  but  which  will  not  have  much  impact  hereafter. 

Definition  15.  A  substitution  is  a  finite  function  cr  from  variables  to  terms. 

Notation.  For  any  structure  (condition,  rule,  etc.)  s,  let  <r(s)  denote  s  having  each 
variable  v  occurring  in  s  that  is  also  in  the  domain  of  a,  replaced  by  cr(v). 

Definition  16.  A  ground  substitution  is  a  substitution  such  that  the  range  consists 
only  of  ground  terms. 

Definition  17.  A  ground  formula  /  is  said  to  match  a  factset  F  iff  one  of  the 
following  is  true: 

•  /  is  a  fact  and  /  e  F; 

•  /  =  Not(/')  and  /'  i  F; 

•  /  =  And(/]  ...  fn )  and  for  all  i  such  that  1  <  i  <  n,  fi  matches  F. 
Proposition  1.  A  ground  formula  f  matches  a  factset  F  iff  the  following  hold: 

•  C+(f)  C  F; 

•  r(/)nF  =  0. 

Definition  18.  A  non-ground  formula  /  is  said  to  match  a  factset  F  iff  there  exists 
a  ground  substitution  o  such  that  cr(/)  is  a  ground  formula  that  matches  F . 

Definition  19.  The  result  of  applying  a  ground  action  a  to  a  factset  F,  denoted 
a(F),  is  defined  as  follows: 

•  if  a  =  Assert  (/) ,  then  a(F)  =  F  U  {/}; 

•  if  a  =  Retract  (/),  then  a(F)  —  F  \  {/}. 

From  this  point  on,  the  definitions  differ  significantly  from  R1F-PRD. 

Definition  20.  The  result  of  applying  a  ground  action  block  a  =  Do(cii  . . .  a„)  to 
a  factset  F,  denoted  a(F),  is  defined  as  a(F)  —  ano  ...  o  ai(F). 
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Definition  21.  The  result  of  firing  a  ground  rule  p  =  If  /  Then  a  on  a  factset  F, 
denoted  p(F),  is  defined  as  a(F)  (regardless  of  whether  /  matches  F). 

Definition  22.  A  ground  rule  p  is  said  to  be  an  instance  of  a  rule  r  iff  there  exists 
a  ground  substitution  a  such  that  cr(r)  =  p. 

Definition  23.  A  rule  is  said  to  be  matchable  iff  there  exists  an  instance  of  the  rule 
p  and  a  factset  F  such  that  the  condition  formula  of  p  matches  F.  A  rule  that  is 
not  matchable  is  said  to  be  unmatchable. 

Definition  24.  A  rule  r  is  said  to  be  finitely  matchable  iff  r  is  matchable  and  for 
any  factset  F,  the  set  of  all  rule  instances  of  r  that  match  F  is  finite. 

Definition  25.  A  ruleset  is  a  set  of  finitely  matchable  rules. 

Definition  26.  Given  a  factset  F  and  a  ruleset  R ,  the  conflict  set  wrt  R  and  F , 
denoted  conf(R ,  F),  is  the  set  of  all  rule  instances  such  that  for  each  p  e  conf(R ,  F), 
p  is  an  instance  of  a  rule  in  R  that  has  a  condition  formula  that  matches  F. 

Proposition  2.  A  ground  rule  p  is  in  conf(R,F )  iff  the  following  hold: 

•  p  is  an  instance  of  some  rule  in  R; 

•  C+(p )  C  F; 

•  c^(P)nF  =  0. 

Earlier  in  section  3.1.1,  1  pointed  out  that  it  was  unusual  that  by  my  definition 
of  rule,  it  is  syntactically  valid  that  a  rule  have  no  non-negated  subformula  in  its 
condition.  The  peculiarity  there  is  that  it  seems  as  though  a  rule  could  have  infinitely 
many  rule  instances,  and  thus  a  conflict  set  could  be  infinite,  leading  to  (as  will  be 
clear  from  later  definitions)  non-terminating  inference  cycles.  However,  this  has  been 
prevented  by  definitions  24  and  25.  Although  the  characteristic  of  being  “finitely 
matchable”  is  usually  syntactically  enforced,  the  nuances  and  complexities  of  the 
syntactic  restrictions  that  provide  such  a  guarantee  are  unnecessary  for  this  work. 
All  that  matters  here  is  that  the  rules  are  indeed  finitely  matchable. 
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Definition  27.  Additional  information  is  an  intentionally  vaguely  defined  notion 
of  state  that  an  inference  system  can  keep  and  modify  throughout  the  inference 
process,  and  which  may  affect  the  outcome  of  inference. 

As  stated  in  the  definition,  the  idea  of  “additional  information”  is  intentionally 
vague,  and  I  will  not  make  much  use  of  it.  However,  it  is  included  for  compatibility 
purposes  for  the  way  many  systems  actually  work.  For  example,  in  RIF-PRD, 
it  is  assumed  that  a  history  of  factsets  is  (effectively)  kept  as  inference  progresses. 
Such  information  can  be  used  by  conflict  resolution  strategies  and  halting  conditions 
(defined  hereafter). 

Definition  28.  A  conflict  resolution  strategy  (CRS)  is  a  function  S  such  that  for 
any  set  of  matchable  ground  rules  R ,  any  factset  F,  and  some  additional  information 
/,  S(I,R,F )  =  (pi)"=1  where  n  >  0,  \{pi}f=1\  =  n,  and  {/9,}”=1  C  conf(R,F). 

The  definition  of  CRS  requires  some  clarification.  {pi}"=1  C  conf(R ,  F)  means 
that  S  will  select  a  finite  number  of  rules  from  the  conflict  set.  |{p.j}"=1|  =  n  means 
that  ( Pi)i=i  will  not  contain  any  duplicate  ground  rules. 

Definition  29.  A  halting  condition  FI  is  a  function  such  that  given  a  CRS  S,  a 
ruleset  R ,  a  factset  F,  and  some  additional  information  /,  H(I ,  S,  R ,  F)  is  a  boolean 
value. 

Definition  30.  An  information  keeper  X  is  a  function  such  that,  given  a  halting 
condition  H,  a  CRS  S,  a  ruleset  R,  a  factset  F ,  and  some  additional  information 
1, 1(1,  H,  S,  R ,  F)  returns  some  additional  information.  For  any  information  keeper 
X,  let  Xg  denote  some  initial,  additional  information. 

Definition  31.  A  program  is  a  quadruple  (X,  H,  S,  R)  where  X  is  an  information 
keeper,  H  is  a  halting  condition,  S  is  a  CRS,  and  R  is  a  ruleset. 

Definition  32.  A  program  instance  is  a  quintuple  (X,  H,  S,  R ,  F)  where  (X,  H,  S ,  R) 
is  a  program  and  F  is  a  factset. 

Algorithm  1  provides  the  operational  semantics  for  a  single  cycle  in  the  in¬ 
ference  process.  A  CRS  is  used  to  determine  which  rule  instances  in  the  conflict 
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Algorithm  1:  cycle(J,  S,  R,  F ) 

Data:  Additional  information  /,  a  CRS  S.  a  ruleset  R.  and  a  factset  F. 
Result:  The  factset  F'  resulting  from  a  single  inference  cycle. 

1  w;=1  =  s(/,  r,  f) 

2  F'  =  F 

3  for  j  =  1  to  n  do 

4  |  F'  =  PJ(F') 

5  end 

6  return  F' 


set  are  to  be  fired,  and  also  in  what  order  they  should  be  fired.  Then  each  rule 
instance  is  fired  one  at  a  time  in  the  order  determined  by  the  CRS.  Traditionally,  a 
CRS  would  choose  only  a  single  rule  [28].  However,  the  operational  semantics  here 
are  fashioned  after  the  operational  semantics  of  RIF-PRD  which  allows  for  greater 
flexibility.  The  ability  to  select  more  than  one  rule  instance  will  prove  convenient 
for  parallel  inference. 


Algorithm  2:  infer(7r) 


1 

2 
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Data:  A  program  instance  tt  =  (I,  H,  S,  R.  F). 
Result:  The  closure  F*,  if  the  procedure  terminates. 

/*  =  x0 

F*  =  F 

while  7iot  H(I *,  S,  R,  F*)  do 

r  =  i{r,H,  s,  r,  f*) 

F'  =  cycle(r,S,  R,  F*) 
i*  =  r 

F*  =  F' 

end 

return  F* 


Algorithm  2  provides  the  operational  semantics  for  the  entire  inference  process, 
which  is  straightforward.  As  long  as  the  halting  condition  H  does  not  indicate 
that  inference  should  stop,  additional  inference  cycles  are  performed.  The  result  of 
algorithm  2  for  input  n  is  referred  to  as  the  closure  of  n. 

Definition  33.  The  length  to  of  infer( n)  is  a  non-negative  integer  or  oo  such  that 
to  is  the  number  of  times  the  loop  at  line  3  iterates. 
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Definition  34.  The  ith  factset  when  calling  infer(n)  is  the  factset  F*  from  line 
7  during  the  min{o;,i}t/l  iteration  of  the  loop  at  line  3,  where  u  is  the  length  of 
infer(  tt). 

Note  that  even  if  the  length  of  infer(n )  is  some  integer  uj  (not  oo),  there 
is  still  a  meaningful  definition  for  the  ith  factset  when  calling  infer( n)  even  when 
i  >  cu,  which  is  the  same  as  the  ojth  factset.  Loosely  phrased,  the  closure  is  “carried 
forward”  indefinitely. 

Definition  35.  A  program  instance  tt  is  said  to  terminate  iff  the  length  of  infer(n) 
is  not  oo. 

Definition  36.  A  program  II  is  said  to  terminate  iff  every  instance  n  of  II  termi¬ 
nates. 

3.2  Distribution 

In  this  section,  I  introduce  the  formal  concepts  of  parallel  inference.  Unlike 
the  previous  sections  in  this  chapter  that  introduced  variations  and  reformations  on 
rules  and  their  operational  semantics  as  defined  in  RIF-PRD,  this  section  begins  the 
significantly  novel  contributions  of  this  chapter. 

Definition  37.  A  distribution  scheme  is  a  triple  T>  =  (A/",  (j),  6)  where: 

•  J\f  =  {i}”r0'  for  some  non- negative  integer  n; 

•  (f)  is  a  function  mapping  facts  to  subsets  of  A/"; 

•  6  is  a  function  mapping  facts  to  non-empty  subsets  of  A 7; 

•  for  any  fact  /, 

-  <t>(f)  Q  0(f)  and 

-  [*(/)  =  0] ->[*(/)  =  A/]. 

The  intuition  behind  the  definition  of  distribution  scheme  is  as  follows.  There 
is  some  finite  set  of  processor  identifiers  given  by  A/".  For  any  fact  /,  <f>  determines 
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which  processors  must  have  /  (should  /  be  present  in  a  factset),  and  9  determines 
which  processors  may  (or  are  allowed  to)  have  /.  Another  way  to  think  of  it  is 
that  0(/)  is  the  set  of  processors  guaranteed  to  have  /  (should  /  be  present  among 
the  processors),  and  A f  \  9(f)  is  the  set  of  processors  guaranteed  not  to  have  /. 
The  necessity  of  0  is  apparent,  but  perhaps  the  utility  of  6  is  less  obvious.  6  is 
important  for  supporting  parallel  inference  with  negation  and  retraction,  as  will  be 
demonstrated  in  later  theorems  and  proofs.  Note  that  there  are  two  conditions  on 
the  relationship  between  0  and  9.  The  first  is  that,  for  any  fact  /,  if  a  processor  must 
have  /,  then  it  must  also  be  allowed  to  have  f .  The  second  is  that,  if  no  processor 
is  required  to  have  /,  then  any  processor  is  allowed  to  have  /.  That  is,  if  ©  does 
not  give  any  guarantee  about  where  /  is  to  be  placed,  then  ©  must  allow  /  to  be 
placed  arbitrarily. 

Definition  38.  Given  a  distribution  scheme  ©  =  (A/",  0,0),  a  ©  -  distribution  of  a 
factset  F  is  a  total  function  T  from  A f  to  subsets  of  F  such  that  the  following  hold: 

•  if  /  G  F  and  k  G  0(/),  then  /  G  F(k); 

•  if  /  G  F  and  k  0  9(f),  then  /  0  F( k ); 

•  F  =  IW-Up). 

The  ©-distribution  of  a  factset  F  is  an  assignment  of  facts  in  F  to  processors 
identified  by  the  integers  in  A f.  F(k)  is  the  subset  of  F  assigned  to  processor  k.  Note 
that  there  are  three  conditions  on  a  valid  ©-distribution  of  F.  First,  if  a  fact  /  is  in 

the  factset  F,  and  if  k  must  have  /,  then  indeed,  processor  k  is  assigned  /.  Second, 

if  a  fact  /  is  in  the  factset  F,  and  if  k  is  not  allowed  to  have  /,  then  indeed,  processor 
k  is  not  assigned  /.  Finally,  the  union  over  the  processors’  assigned  factsets  equals 
the  (non-distributed)  factset  F . 

Algorithm  3  provides  the  operational  semantics  for  parallel  inference.  It  is  a 
variation  on  algorithm  2  that  accounts  for  distribution  of  data  using  the  previously 
defined  concepts.  On  line  1,  the  input  factset  F  is  distributed  among  processors 
(identified  by  A f)  such  that  T  is  a  ©-distribution  of  F .  The  loop  at  line  2  is  executed 
in  parallel  as  indicated  by  the  “pardo” .  The  body  of  that  loop  is  essentially  the  same 
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Algorithm  3:  parinfer(©,  n) 
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Data:  A  distribution  scheme  ©  =  (A/",  0,  d)  and  a  program  instance 
7T  =  (I,  H,  S,  R,  F) . 

Result:  The  parallel  V -closure  F*,  if  the  procedure  terminates, 
let  T  be  any  ©-distribution  of  F 

for  p  G  Af  pardo 

i;  =  ^ 
f;  =  rip) 

while  not  H(I*,S,R,F*)  do 

i'p  =  i(i;,h,s,r,f;) 

F'p  =  cycle(I*,S,  R,  F*) 

I*  =  r 

F,;  =  {f\feF' Ape  6(f)} 

end 

end 

f‘  =  \J^f; 

return  F* 


as  the  contents  of  algorithm  2  except  for  one  important  difference  at  line  9.  Line  9 
amounts  to  saying  that  if  a  processor  infers  a  fact  /  that  it  is  not  allowed  to  have 
(determined  by  9),  then  /  is  removed  directly  after  the  cycle  in  which  it  is  inferred. 
After  each  processor  finishes,  then  on  line  12,  the  union  is  taken  over  the  processors’ 
results,  and  that  is  the  final  result  returned  on  line  13,  referred  to  as  the  T>-closure 
of  7 r. 

Line  9  makes  parallelization  of  inference  slightly  non-trivial.  If  9(f)  =  A f  for 
any  /  (i.e.,  any  processor  is  allowed  to  have  any  fact),  then  parallelization  consists 
of  merely  ©-distributing  the  factset  and  letting  each  processor  perform  sequential 
inference  using  algorithm  2.  However,  if  there  exists  a  fact  /  such  that  9(f)  C  A f 
(i.e.,  some  processor  is  not  allowed  to  have  some  fact),  then  the  filtering  at  line  9 
causes  algorithm  3  to  be  effectively  different  from  the  case  in  which  each  processor 
merely  executes  algorithm  2. 

Note  also  that  algorithm  3  does  not  ensure  on  its  own  that  the  ©-closure  of 
7 r  is  the  same  as  the  (sequential)  closure  of  7r.  It  merely  provides  a  mechanism  of 
parallel  inference.  Whether  the  ©-closure  of  n  is  the  same  as  the  (sequential)  closure 
of  7T  (assuming  such  closures  exist)  depends  on  the  relationship  between  ©  and  7 r, 


which  is  the  topic  of  section  3.3. 

Definition  39.  The  length  04.  of  parinferfD,  n)  for  processor  k  is  a  non-negative 
integer  or  00  such  that  LOk  is  the  number  of  times  in  algorithm  3  that  the  loop  at 
line  5  iterates  when  the  loop  at  line  2  is  on  iteration  k  G  A f. 

Definition  40.  The  ith  factset  for  processor  p  when  calling  parinferffD, n )  is  the 
factset  F*  from  line  9  in  algorithm  3  during  the  min{u;p,  i}th  iteration  of  the  loop 
at  line  5  when  the  loop  at  line  2  is  on  iteration  p  e  A /",  where  up  is  the  length  of 
parinferfD ,  1 r)  for  processor  p. 

Put  more  plainly,  the  length  of  parinferffD,  1 r)  for  processor  k  is  the  number 
of  inference  cycles  performed  by  processor  k  (which  may  be  00).  The  ith  factset  for 
processor  k  when  calling  parinferffD ,  7 r)  is  the  factset  held  by  processor  k  after  the 
ith  cycle,  or  if  1  is  greater  than  the  length  (number  of  cycles)  c Ok  for  processor  k ,  it  is 
the  factset  held  by  processor  k  after  the  cuf1'  cycle  (i.e.,  processor  /c’s  local  closure). 

3.2.1  Definitions  for  Correct  Parallel  Inference 

In  this  section,  definitions  are  given  for  correctness10  of  parallel  inference. 
Most  intuitively,  if  the  result  in  parallel  is  the  same  as  the  result  in  sequential,  then 
parallel  inference  is  considered  correct. 

Definition  41.  Given  a  distribution  scheme  D ,  a  program  II  is  weakly  D -parallel 
iff  for  any  instance  n  of  II,  the  following  hold: 

•  inf  erf n)  terminates; 

•  parinferffD ,  7r)  terminates; 

•  infer( n)  —  parinferffD,  n). 

Definition  41  is  the  simplest  definition  for  correct  parallel  inference.  It  only 
cares  about  the  answers.  It  is  not  concerned  with,  for  example,  order  of  rule  firings 
or  whether  the  individual  cycles  (logically)  synchronize  in  some  way.  It  only  cares 
that  the  final  results  are  the  same.  However,  not  all  programs  are  guaranteed  to 


10For  logicians,  correctness  as  defined  in  this  section  implies  soundness  and  completeness. 
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terminate,  and  so  definition  41  is  not  useful  in  those  cases.  Therefore,  the  following 
definition  is  introduced. 

Definition  42.  Given  a  distribution  scheme  V ,  a  program  fl  is  cyclically  T>-parallel 
iff  for  any  instance  n  of  fl,  for  all  i  >  1,  letting  F*  be  the  ith  factset  when 
calling  infer  ('ll),  and  letting  Fpi  be  the  ith  factset  for  processor  p  when  calling 
parinfer(V,  vr),  F*  =  UpeA r  Fp,i- 

The  intuition  behind  definition  42  is  that  the  union  over  the  processors’  local 
factsets  after  i  cycles  should  be  the  same  as  the  (sequentially  produced)  factset  after 
i  cycles.  Therefore,  even  if  a  program  does  not  terminate,  if  the  number  of  cycles 
is  fixed,  then  being  cyclically  ©-parallel  means  that  the  results  will  be  the  same  in 
parallel  as  in  sequential  when  the  input  factset  is  ©-distributed. 

Proposition  3.  If  a  program.  II  is  cyclically  V-parallel  and  fl  terminates,  then  fl 
is  weakly  V-parallel. 

Proof.  Straightforward  from  definitions  36,  41,  and  42.  □ 

Definition  42  is  a  strong  enough  definition  for  the  goal  of  this  work,  but  in 
traditional,  parallel,  production  rule  systems  literature,  a  stronger  notion  of  correct 
parallel  inference  has  been  used,  based  on  the  serializability  criterion  [14],  Therefore, 
one  more  definition  is  given  for  parallel  inference. 

Definition  43.  A  set  of  sequences  {(xp.j}Jfl  }peM  interleaves  to  a  sequence  (y,)"=l 
iff  one  of  the  following  holds: 

•  if  n  —  0,  then  for  all  p  G  A f,  mp  =  0; 

•  if  n  >  0,  then  {S'p}p interleaves  to  {Vi)™=2  where 

—  there  exists  p  G  M  such  that  mp  >  0,  xPt\  =  y±,  and  S'p  =  (xPJ)™T2, 

—  for  all  p  G  A/",  either 

*  rrip  >  0,  xpA  =  7/i,  and  Sp  =  {xpJ)™f2, 

*  s'P  =  <^>;=y 
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Definition  44.  Given  a  distribution  scheme  ©  =  (AT,  <t>,9),  a  program  II  is  strongly 
©-parallel  iff  the  following  hold: 

•  II  is  cyclically  ©-parallel: 

•  for  any  program  instance  n  of  II  wrt  to  F  and  any  ©-distribution  T  of  F, 
letting  (pi)“=1  be  the  sequence  of  rule  instances  fired  when  calling  infer( tt), 
and  letting  {pp,j)“=\  be  the  sequence  of  rule  instances  fired  in  the  p  e  AT 
iteration  of  the  call  to  parinfer(V,ir),  {{pP,j)“=i}pe M  interleaves  to  (p,)^=v 

Definition  44  is  the  same  as  definition  42  except  with  the  additional  condition 
that  the  rule  instance  firings  of  each  processor  in  parallel  inference  must  be  able  to 
interleave  to  the  rule  instance  firings  in  sequential  inference.  Not  only  then  are  the 
“answers  the  same”  (as  required  by  being  cyclically  ©-parallel),  but  they  can  also 
be  justified  in  a  stronger  sense.  This  notion  of  correct  parallel  inference  is  included 
for  completeness  with  respect  to  previous,  related  work. 

3.3  Sufficient  Conditions 

Having  established  definitions  for  rules,  inference,  and  parallel  inference,  in  this 
section,  I  look  at  sufficient  conditions  on  distribution  schemes  in  relation  to  rulesets 
under  a  certain  class  of  CRSs  and  halting  conditions  such  that,  when  the  conditions 
are  met,  parallel  inference  is  correct.  To  support  these  theorems,  a  number  of 
additional  definitions  are  introduced  as  well. 

First  in  section  3.3.1,  sufficient  conditions  are  determined  for  ground  rules 
only.  Then,  in  section  3.3.2,  the  conditions  from  section  3.3.1  are  generalized  to 
rules  (ground  or  otherwise).  Generalizing  the  conditions  to  (general)  rules  is  impor¬ 
tant  because  it  allows  for  the  conditions  to  be  checked  by  direct  inspection  of  the 
rules  rather  than  considering  every  possible  rule  instance  (of  which  there  could  be 
infinitely  many). 

3.3.1  Conditions  on  Rule  Instances 

First  consider  what  it  would  take  for  a  ground  rule  p  that  matches  a  factset  F 
to  match  some  local  factset  F(k)  where  ©  is  a  ©-distribution  of  F  and  ©  =  (J\T,  (f),  9). 
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Clearly,  some  processor  must  have  all  the  facts  in  the  (non-negated)  portion  of  the 
condition  of  p,  that  is,  the  facts  in  C+(p).  There  are  two  ways  this  can  happen,  <0 
can  guarantee  that  every  /  G  C+(p)  is  assigned  to  some  common  processor,  or  it  can 
guarantee  that  for  all  but  at  most  one  /  G  C+(p),  /  is  replicated  to  all  processors. 
In  the  latter  case,  the  placement  of  the  one  fact  g  €  C+(p)  that  is  not  replicated 
does  not  matter  because  whichever  processor  has  it  (and  some  processor  will  have  it 
since  F  =  UpeAf  that  processor  will  also  have  all  C+(p )  \  {g}.  This  intuition 

leads  to  the  definition  of  a  ground  rule  p  being  'D-matchable. 

Definition  45.  Let  p  be  a  ground  rule,  and  let  T>  =  (A f,<f>,0)  be  a  distribution 
scheme,  p  is  said  to  be  V-matchable  iff  one  of  the  following  holds: 

*  Pi /ec+(p)  ^(/)  7^ 

•  Vffec+(P)  A/ec+(p)\{fl}[0(/)  =  -/VT 

However,  it  is  not  enough  to  say  that  if  p  matches  F  that  it  also  matches 
some  processor’s  local  factset.  It  must  also  hold  that  if  p  does  not  match  F,  then 
p  also  does  not  match  any  processor’s  local  factset.  By  proposition  1,  there  are 
two  ways  that  a  ground  rule  does  not  match  a  factset  F.  Either  C+(p )  ^  F  or 
C~"{p)  fl  F  ^  0.  In  the  former  case,  C+(p )  ^  F  implies  C+(p)  £  F(p)  for  all  p  G  Af 
because  F  =  ^(p),  or  put  another  way,  for  all  p  G  A f,  F(p)  C  F.  Considering 

the  latter  case,  that  C~"{p)  fl  F  7^  0,  it  must  be  ensured  that  for  any  p  G  A f,  if 
C+(p)  Q  F(p),  then  C^(p)  fi  F (p)  ^  0.  That  is,  if  a  processor  has  the  facts  matched 
by  the  non-negated  portion  of  the  condition  of  p,  then  it  should  also  have  the  facts 
matched  by  the  negated  subformulas  of  p.  Otherwise,  a  processor  may  fire  p  even 
though  it  was  “blocked”  from  firing  in  sequential  inference.  This  intuition  leads  to 
the  definition  of  a  ground  rule  p  being  F-blockablc. 

Definition  46.  Let  p  be  a  ground  rule,  and  let  T>  =  ( J\f,(j>,9 )  be  a  distribution 
scheme,  p  is  said  to  be  V-blockable  iff  one  of  the  following  holds: 

•  £T(p)  =  0; 

#  rw+(p)  0(f)  Q  D/eC-^p)  0(f)- 
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Put  casually,  the  second  condition  of  definition  46  says  that  where  ever  the 
facts  in  the  non- negated  portion  of  the  condition  of  p  could  be  placed  (together), 
the  facts  in  the  negated  portion  of  the  condition  of  p  must  also  be  placed  there. 
With  definitions  45  and  46,  the  following  two  lemmas  can  now  be  formed. 

Lemma  4.  Let  R  be  a  ruleset,  and  let  V  =  ( J\f ,  <f>,  6)  be  a  distribution  scheme.  If 
every  rule  instance  of  a  rule  in  R  is  V-matchable  and  V-blockable,  then  for  any 
V- distribution  T  of  a  factset  F,  if  p  G  conf(R,F )  and  k  G  D/ec+(p)  0(/)?  then 
p  G  conf(R,F(k)). 

Proof.  If  p  G  conf(R ,  F),  then  by  proposition  2,  C+(p )  C  F  and  C~"{p)  fl  F  —  0.  If 
k  G  D/eC+(p)  0(/);  then  by  definition  38,  C+(p)  C  IF{k).  Also,  if  C~"{p)  fl  F  =  0,  by 
definition  38,  C^(p)  fl F(k)  =  0.  Therefore,  by  proposition  2,  p  G  conf(R ,  F(k)).  □ 

Lemma  5.  Let  R  be  a  ruleset,  and  let  V  =  ( Af,(f>,9 )  6e  a  distribution  scheme.  If 
every  rule  instance  of  a  rule  in  R  is  V-matchable  and  V-blockable,  then  for  any 
V- distribution  T  of  a  factset  F,  conf(R,F )  =  Upe7V  conf(R,,  F (p) ) . 

Proof.  First  proving  that  if  every  rule  instance  p  of  a  rule  in  R  is  L’-matchable 
and  D-blockable,  then  conf(R,F )  C  [jpeJ^conf(R,Jr(p)).  By  proposition  2,  p  G 
conf(R,  F )  iff  C+(p )  C  F  and  C^(p)  fl  F  —  0.  Since  p  is  "D-matchable,  then  either 

(a)  fl fec+^Hf)  ±  or 

(b)  Vffec+(P)  /\feC+(P)\{g}W)  =A/”]- 

By  definition  38,  case  (a)  means  there  exists  some  k  G  C\fec+(p)  which  by 

definition  38  means  that  C+(p)  C  F{k).  Case  (b)  means  that  for  all  p  G  Af, 
^+(p)  \  {fl1}  —  F{p),  and  by  definition  38,  there  is  some  k  G  Af  such  that  g  G  F{k). 
Therefore,  whether  case  (a)  or  (b),  there  is  some  k  G  Af  such  that  C+(p )  C  F{k). 
Additionally,  by  definition  38,  since  C~"{p)P\F  =  0,  then  C^(p)P\F(k)  =  0.  By  propo¬ 
sition  2,  this  means  that  p  G  conf(R,F(k)),  which  means  p  G  Upe_,V  corif  (R.,F (p) ) ■ 
Therefore,  conf{R,F )  C  \JpeMconf(R,F(p)). 

Now  proving  that  if  every  rule  instance  p  of  a  rule  in  R  is  L’-matchable  and 
P-blockable,  then  conf(R,F)  D  {Jp(-M  conf(R,  F(p)).  p  G  \JpeAfconf(R,Jr(p)) 
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iff  there  exists  k  G  J\f  such  that  p  G  conf(R,  F(k)).  By  proposition  2,  p  G 
conf(R,F(k))  iff  C+(p)  C  F(k)  and  C"(p)  fl  F{k)  =  0.  From  definition  38,  since 
F(k)  C  F,  then  C+(p)  C  F. 

For  a  moment,  assume  the  possibility  that  C_’(p)  fl  F  7^  0,  in  which  case 
p  ^  conf(R,F),  defeating  the  (real)  goal  of  the  proof.  By  definition  38,  k  G 
n/6C+(,)0(/),  which  by  definition  46  implies  k  G  D/ec-(p)  0(/)-  By  definition  38, 
this  means  that  C"(p)  ft  F  C  F(k),  which  means  C"(p)  fl  F(fc)  7^  0.  However,  this 
contradicts  what  has  already  been  established,  that  C"(p)  fl  F(/c)  =  0.  So  it  is  not 
possible  that  (F(p)  fl  F  7^  0. 

Therefore,  since  C+(p)  C  F  and  C"(p)  fl  F  =  0,  then  by  proposition  2,  p  G 
conf(R,F),  and  conf(R,F )  5  UPeAt  conf(R,  F(p)). 

Finally,  since  conf(R,F )  C  (Jp&v-  c°nf(R,,  F(p  j)  and 

conf  (R,  F)  D  (JpeAr  conf(R,F(p)),  then  conf(R ,  F)  =  (JpeAr  conf(R,F(p)).  □ 

Lemma  5  provides  sufficient  conditions  for  correctly  matching  rules  (i.e.,  deter¬ 
mining  the  conflict  set)  in  parallel,  but  this  is  far  from  sufficient  for  correct  parallel 
inference  (of  any  form  previously  defined).  The  CRS  of  a  program  determines  which 
rule  instances  in  the  conflict  set  are  to  be  bred  and  in  what  order.  To  simplify  mat¬ 
ters,  I  start  by  considering  a  class  of  CRSs  that  I  call  AOCs,  and  then  an  even  more 
restricted  class  I  call  RAOCs. 

Definition  47.  An  All- and- Ordered  CRS  (AOC)  is  a  CRS  S  such  that  S  selects 
all  the  rule  instances  in  the  conflict  set  and  orders  them  according  to  some  total 
ordering  of  rule  instances.  An  AOC  effectively  ignores  any  additional  information. 

Definition  48.  A  Retractions-first  AOC  (RAOC)  is  an  AOC  S  such  that  for  any 
ruleset  R  and  factset  F,  for  1  <  i  <  F)|  ,  if  the  ith  rule  instance  p*  in 

S(*,R,F )  is  such  that  A^(pi)  =  0,  then  for  i  <  j  <  \S(*,R,F) |,  the  jth  rule 
instance  pj  in  S(*,R,F )  is  such  that  A^(pj)  =  0. 

An  AOC  may  seem  like  an  arbitrary  choice  in  CRSs.  Recall,  though,  that 
the  purpose  of  a  CRS  is  to  provide  developers  with  an  understanding  of  how  infer¬ 
ence  progresses  [14].  Traditionally,  parallel  CRSs  have  either  been  non-detcrministic 
(thus  defeating  the  purpose  of  a  CRS)  or  controlled  by  advanced  mechanisms  like 
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metarules  [14].  Although  I  have  specifically  chosen  RAOCs  because  they  are  amenable 
to  parallelization  (to  be  shown),  AOCs  in  general  have  other  good  qualities  that 
make  them  a  sensible  choice.  Firstly,  they  are  deterministic  dne  to  the  total  or¬ 
dering  on  rule  instances,  so  one  can  know  for  certain  how  inference  will  progress. 
Secondly,  AOCs  do  not  require  any  additional,  complex  mechanisms.  Perhaps  the 
greatest  drawback,  though,  is  that  because  AOCs  fire  the  entire  conflict  set,  the 
results  can  be  non-intuitive  (although  not  from  the  perspective  of  the  operational 
semantics).  For  RAOCs,  this  is  similar  to  using  inflationary  semantics  in  Datalog" 
inference  [6],  so  such  effects  are  not  entirely  unfamiliar. 

Since  an  AOC  effectively  ignores  additional  information,  to  simplify  the  follow¬ 
ing,  1  will  use  a  *  in  place  of  additional  information  to  indicate  that  any  additional 
information  can  be  used  in  that  place. 

In  addition  to  restricting  the  class  of  CRSs  under  consideration,  I  also  restrict 
the  class  of  rules  under  consideration,  as  given  in  the  following  definitions. 

Definition  49.  A  polarized  ground  rule  is  a  ground  rule  p  such  that  either  A+(p)  =  0 
or  A~"(p)  =  0. 

Definition  50.  A  polarized  rule  is  a  rule  such  that  every  instance  of  the  rule  is 
polarized. 

Definition  51.  A  polarized  ruleset  is  a  ruleset  containing  only  polarized  rules. 

Restricting  consideration  to  RAOCs  and  polarized  rulesets  has  the  advantage 
that  in  an  inference  cycle,  all  retractions  will  take  place  before  any  assertions,  thus 
trivializing  the  issue  of  ordering  rule  instances  within  a  single  cycle.  Within  a  single 
cycle,  no  assertion  can  be  “undone”  by  a  retraction,  although  a  retraction  can  be 
“undone”  by  an  assertion.  However,  any  such  “undoing”  is  part  of  the  semantics  of 
a  RAOC,  and  therefore,  the  compatibility  problem  [9]  is  trivially  resolved.  Lemma 
6  formally  captures  this  characteristic  and  draws  useful  conclusions  from  it. 

Lemma  6.  Let  S  be  a  RAOC,  let  R  be  a  polarized  ruleset,  and  let  F  be  a  factset. 
f  G  cycle(*,  S,  R,  F )  iff  one  of  the  following  holds: 

•  /  G  F  and  for  all  p  G  conf(R,  F),  f  ^  A~'(p); 
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•  there  exists  p  G  conf(R,F )  such  that  f  €  *4.+  (p). 

Proof.  First  proving  that  if  /  G  F  and  for  all  p  G  conf(R,F)f  f  f  A~"(p)',  then 
/  G  cycle(*,  S,  R,  F).  Straightforwardly,  if  /  G  F  and  there  is  no  rule  instance 
p  G  conf(R,  F)  that  retracts  /,  then  /  will  not  be  retracted  from  F  when  performing 
an  inference  cycle  and  /  G  cycle(*,  S,  R ,  F). 

Now  proving  that  if  there  exists  p  G  conf(R,F )  such  that  /  G  *4.+  (p),  then 
/  G  cycle(*,  S,R,  F).  By  definition  48,  p  will  be  fired,  and  since  R  is  a  polarized 
ruleset,  A~"(p)  =  0,  which  by  definition  48  means  that  no  rule  instance  will  be  fired 
after  p  that  retracts  facts.  Therefore,  once  /  is  asserted  by  p,  it  will  remain  in  the 
factset  until  the  end  of  the  cycle,  and  so  /  G  cycle(*,  S,  R ,  F). 

Finally  proving  that  if  /  G  cycle(*,  S,  R,F),  then:  /  G  F  and  for  all  p  G 
conf(R,F),  f  ^  >C(p);  or  there  exists  p  G  conf(R,F )  such  that  /  G  A+(p). 
Proof  by  contradiction.  Assume  /  G  cycle(*,  S,  R,  F);  /  ^  F  or  there  exists  p  G 
conf(R,F )  such  that  /  G  ■4T’(p);  and  for  all  p  G  conf(R,F ),  /  ^  *4.+  (p).  Let  F'  be 
the  factset  in  algorithm  1  after  all  the  retraction  actions  have  been  fired  (which  exists 
by  definition  48  given  that  R  is  a  polarized  ruleset).  Then  since  either  /  ^  F  or  there 
exists  p  G  conf(R,F )  such  that  /  G  *4d(p),  then  /  ^  F'.  By  definition  48,  only 
assertion  actions  remain  to  be  bred,  but  since  for  all  p  G  conf(R,F),  f  ^  A+(p), 
then  /  will  not  be  asserted,  and  /  f  cycle(*,  S ,  F,  F).  This  is  a  contradiction.  □ 

Corollary  7.  Let  S  be  a  RAOC,  let  R  be  a  polarized  ruleset,  and  let  F  be  a  factset. 
f  ^  cycle(*,  S,  R,  F)  iff  the  following  hold: 

•  for  all  p  G  conf  (R,  F),  f  £  A+(p); 

•  if  f  G  F,  then  there  exists  p  G  conf(R ,  F)  such  that  f  G  A~"(p). 

Proof.  Follows  directly  from  lemma  6.  □ 

In  the  context  of  RAOCs,  consider  parallelization  of  rule  firing  (having  already 
addressed  rule  matching).  Intuitively,  if  a  fact  /  is  retracted  by  a  rule  instance  p, 
then  p  must  be  fired  on  every  processor  on  which  /  might  exist.  This  intuition  leads 
to  the  following  definition. 
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Definition  52.  Let  p  be  a  ground  rule,  and  let  ©  =  (A/*,  (f>:9)  be  a  distribution 
scheme,  p  is  said  to  be  ©-retractable  iff  U/e.^  w0(/)crww*(/)- 

Although  I  refer  to  such  rules  as  “©-retractable”,  it  is  a  bit  of  a  misnomer. 
Whether  the  retraction  of  a  fact  actually  succeeds  depends  in  part  on  the  CRS 
independent  of  ©,  and  so  even  if  a  ground  rule  p  is  ©-retractable,  it  is  not  guaranteed 
that  the  facts  retracted  by  p  will  indeed  be  retracted.  Regardless,  the  term  ©- 
retractable  is  convenient  for  its  brevity,  and  it  should  be  understood  (as  explicitly 
given  in  the  theorems)  that  RAOCs  are  the  CRSs  under  consideration  (unless  stated 
otherwise). 

Lemma  8  says  that  when  every  possible  instance  of  a  rule  in  R  is  ©-matchable, 
©-blockablc,  and  ©-retractable,  then  a  single  inference  cycle  will  produce  the  same 
result  in  parallel  as  in  sequential,  when  the  facts  are  ©-distributed. 

Lemma  8.  Let  R  be  a  polarized  ruleset,  let  S  be  a  RAOC,  and  let  ©  =  (f,  9,Af)  be 
a  distribution  scheme.  If  every  instance  of  a  rule  in  R  is  V -matchable,  V-blockable, 
and  © -retractable;  then  for  any  V- distribution  T  of  a  factset  F,  cycle(*,  S,  R ,  F)  = 
UpeAf  cycle(*,  S,  R,  F(p)) . 

Proof.  First  proving  that  cycle(*,  S,  R ,  F)  D  (JpgjV-  cycle(*,  S ,  R,F(p)).  By  corollary 
7,  f  f  cycle(*,  S,  R,  F)  implies  that  for  all  p  G  conf(R,F),  f  ^  *4.+  (p).  By  lemma 
5,  this  also  means  that  for  all  p  G  A f,  for  all  p  G  conf(R,F(p)),  f  f  A+(p). 

Again,  by  corollary  7,  f  f  cycle(*,  S,  R,  F)  implies  that  if  /  G  F,  then  there 
exists  p  G  conf(R,F )  such  that  /  G  A~"(p).  Fix  p  accordingly.  By  definition 
38,  /  G  F  implies  that  there  exists  k  G  J\f  such  that  /  G  F(k).  Fix  k  accord¬ 
ingly.  By  definition  37,  this  also  means  that  k  G  0(f).  Then  by  definition  52, 
k  G  r\ec+(P)  0(/)>  and  by  lemma  4,  p  G  conf(R,F(k)).  k  is  fixed  such  that  it  is  any 
k  G  Af  such  that  /  G  F(k).  Therefore,  it  holds  that  for  any  p  G  A 7,  if  /  G  F(p),  then 
p  G  conf(R,  F(p)).  Additionally,  at  the  beginning  of  the  proof,  it  was  established 
that  for  all  p'  G  conf(R,F(k)),  f  A+(p').  By  lemma  6,  this  means  that  for  all 
V  G  Af,  f  i  cycle(*,  S,  R,F(p)). 

Therefore  cycle(*,  S,  R,  F)  ©  UpeAf  cycle(*,  S,  R,  F(p)). 

Now  to  prove  that  cycle(*,  S,  R,  F)  C  (jp&j^cyde(*,  S,  R,F(p)).  If 
/  $  UpeM  cyde(*,  S,  R,  F(p)),  then  for  all  p  G  A/",  /  $.  cycle(*,S,  R,F(p)).  By 
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corollary  7,  this  means  that  for  all  p  G  A f,  for  all  p  G  conf(R,  F(p)),  f  ^  Al+(p), 
or  put  another  way,  for  all  p  G  \Jp£Aj-conf(R,F(p)),  f  ^  A+(p).  By  lemma  5,  this 
means  that  for  all  p  G  conf(R ,  F),  f  ^  *4.+  (p). 

Suppose  that  for  some  k  G  A/”,  /  G  F(/c).  Fix  k  accordingly.  By  definition  38, 
/  G  F  iff  such  k  exists.  By  corollary  7,  there  exists  some  p  G  conf(R,  F{k))  such 
that  /  G  AF(p).  Fix  p  accordingly.  By  lemma  5,  since  p  G  conf(R.  F(k)),  then 
p  G  conf(R,F).  Therefore,  when  /  G  F,  there  exists  p  G  conf(R,F)  such  that 
/  e  ^(p)- 

So  having  established  that  for  all  p'  G  conf(R,F),  f  ^  At+(p'),  and  if  /  G  F, 
then  there  exists  p  G  conf(R,F )  such  that  /  G  *4"(p),  then  by  corollary  7,  /  ^ 
cycle(*,  S ,  F,  F). 

Therefore  cycle(*,  S,  R,  F)  C  cpc/e(*,  S',  F,  F(p)). 

Having  shown  cycle(*,  S,  F,  F)  C  [JpgA/- cpc/e(*,  S,  F,  F(p))  and 
cycle(*,  S,  F,  F)  D  lJpgA/- cpc/e(*,  S,  F,  F(p)),  it  holds  that  cycle(*,  S,  F,  F)  = 

UPeArcPc/e(*>^i?!'7r(p))-  D 

Now  it  has  been  shown  that  a  single  inference  cycle  is  correct  in  parallel  for 
the  conditions  of  lemma  8.  However,  this  is  still  insufficient  for  the  entire  inference 
process.  It  needs  to  be  shown  that  every  cycle  is  correct,  not  just  a  single  cycle 
under  certain  conditions. 

Consider,  though,  if  it  could  be  shown  that  after  each  parallel  cycle,  the  facts 
are  still  F-distributed.  Then  lemma  8  would  mean  that  the  next  cycle  is  also 
correct,  and  the  cycle  after  that,  and  so  on.  Enforcing  6  is  easy  because  it  is  built 
in  to  the  parallel  inference  algorithm  at  line  9  of  algorithm  3.  However,  it  is  not 
as  straightforward  to  enforce  0.  Suppose,  though,  that  whichever  processors  must 
have  an  inferred  fact  are  also  some  of  the  processors  that  infer  that  fact.  Then  V- 
distribution  would  be  preserved  between  cycles.  This  leads  to  the  following  definition 
and  lemma. 

Definition  53.  Let  p  be  a  ground  rule,  and  let  V  =  (Af,4>,9)  be  a  distribution 
scheme,  p  is  said  to  be  F-preserving  iff  U/e^+(P)  <£(/)  £  D/ec+(P)  <£(/)■ 
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Lemma  9.  Let  R  be  a  polarized  ruleset,  let  S  be  a  RAOC,  and  let  V  =  (Af ,  (f>,  9)  be 
a  distribution  scheme.  If  every  instance  of  a  rule  in  R  is  V -matchable,  V-blockable, 
V -retractable,  and  V -preserving;  then  for  any  V -distribution  IF  of  a  factset  F ,  T'  = 
{(p,Fp)  |  F'p  =  {/  |  /  G  cycle(*,S,R,F(p ))  Ape  9(f)}}peAf  is  a  V- distribution  of 
cycle( *,  S,  R,  F). 

Proof.  By  definition  38,  it  must  be  shown  that: 

(a)  if  f  e  cycle(*,  S,  R,  F)  and  k  e  (j>(f),  then  /  e  F'(k); 

(b)  if  f  e  cycle(*,  S ,  R,  F)  and  k  f  9(f),  then  /  f  F'(k ); 

(c)  cycle(*,S,R,F)  =  UPeA fF'(p). 

Starting  with  condition  (a).  By  lemma  6,  /  G  cycle(*,  S,  R,  F)  iff  one  of  the 
following  holds: 

(d)  f  e  F  and  for  all  p  e  conf(R,  F),  f  A^(p); 

(e)  there  exists  p  e  conf(R,F )  such  that  /  G  A+(p). 

Starting  with  case  (d),  since  f  e  F  and  k  e  <t>(f),  then  by  definition  38, 
/  G  F(k).  Since  for  all  p  G  conf(R,F ),  /  ^  A~"(p),  by  lemma  5,  for  all  p  G 
conf(R,F(k )),  /  A~"(p).  Then  by  lemma  6,  /  G  cycle(*,  S,  R,  F(k)).  By  defini¬ 

tion  37,  since  k  G  </>(/),  then  k  G  9(f).  Therefore,  /  G  F'(k). 

Now  turning  to  case  (e),  since  there  exists  p  G  conf(R ,  F)  such  that  /  G  A+(p) 
and  k  G  </>(/),  then  by  definition  53,  k  G  fl9ec+(p)  By  lemma  4,  this  means  that 
p  G  conf(R,  F(k)).  Then  by  lemma  6,  /  G  cycle(*,  S,  R,F(k)).  Since  k  G  </>(/), 
then  by  definition  37,  k  G  9(f) ,  and  so  /  G  F'(k). 

Condition  (a)  holds. 

Condition  (b)  is  trivially  true.  Regardless  of  whether  /  G  cycle(*,  S,  R,  F),  by 
definition  of  T' ,  if  k  £  9(f),  then  /  ^  F'(k). 

As  for  condition  (c),  lemma  8  already  shows  that  cycle(*,  S,  R,  F)  = 
UpeA/"  cyde(*,  S,  R,  F(p)),  and  by  definition  of  T' ,  for  all  p  G  Af,  F'(p)  C 
cycle(*,  S,  R,F(p)).  Therefore,  it  holds  that  UPeA fF'ijp)  Q  cycle(*,  S,R,  F),  and 


39 


it  remains  to  be  shown  that  UPeAr'7r,(p)  ^  cycle(*,S,R,F).  By  lemma  6,  /  G 
cycle(*,S,R,F)  iff  one  of  case  (d)  or  case  (e)  holds. 

Starting  with  case  (d),  since  /  G  F,  then  by  definition  38,  there  exists  k  G  J\f 
such  that  /  G  F(k)  and  k  G  0(/).  By  case  (d),  for  all  p  G  conf(R,F ),  /  0  *4"(p), 
which  by  lemma  5  means  that  for  all  p  G  conf(R,  F(k)),  f  0  */D(p).  Therefore,  by 
lemma  6,  /  G  cycle(*,  S,  R,  F(k)).  Having  already  established  that  k  G  <?(/),  then 
/  G  ^(fc)  C  UpeAr^(p)- 

Turning  to  case  (e),  since  there  exists  p  G  conf(R,F )  such  that  /  G  *4.+  (p), 
then  by  lemma  5  there  exists  k  G  Af  such  that  p  G  conf(R,F(k )),  which  by  lemma 
6  means  that  /  G  cycle(*,  S,  R,  F(k)).  Now  if  (f>(f)  =  0,  then  by  definition  37, 
9(f)  =  J\f  and  therefore  /  G  F'(k).  If  </>(/)  ^  0,  then  there  exists  l  G  </>(/), 
which  by  definition  53  means  that  l  G  fl9ec+(p)  0(90,  which  by  lemma  4  means  that 
p  G  conf(R,  F(l)),  which  by  lemma  6  means  that  /  G  cycle(*,  S,  R,  F(l)).  Since 
it  has  already  been  established  that  /  G  0(/),  then  by  definition  37,  l  G  #(/),  and 
therefore  /  G  F'(l).  Either  way,  /  G  UpeJv  ^ (-d)  • 

Therefore,  Upe_,V =  cycle(*,  S,  R,  F),  and  condition  (c)  holds. 

Since  (a),  (b),  and  (c)  hold,  then  T'  is  a  ©-distribution  of  cycle( *,  5,  R,F).  □ 

Now  with  the  support  of  the  recently  stated  lemmas,  the  first  main  theorem 
can  be  concluded  and  proven  inductively  as  discussed,  but  first,  I  assume  a  particular 
(kind  of)  halting  condition. 

Definition  54.  A  hxpoint  halting  condition,  denoted  RfiX,  is  (logically)  defined  as 
nfix(I,  S,  R,  F)  =  [F  =  cycle(I ,  S,  R,  F)}. 

Theorem  10.  Let  R  be  a  polarized  ruleset,  let  S  be  a  RAOC,  let  X  be  any  infor¬ 
mation  keeper,  and  let  V  =  (Af,  (j),  6)  be  a  distribution  scheme.  If  every  instance 
of  a  rule  in  R  is  V -matchable,  V -blockable,  V -retractable,  and  V -preserving;  then 
program  n  =  (I,  Rf%x,  S,  R)  is  cyclically  V  -parallel. 

Proof.  It  must  be  shown  that  for  any  instance  n  of  n  wrt  F.  for  any  non- negative 
integer  i ,  letting  F \  be  the  ith  factset  when  calling  inferfn)  and  letting  FPti  be  the 
ith  factset  for  processor  p  when  calling  parinferfn ),  Ft  =  \JpeJ^Fp  i.  Proof  is  by 
induction. 
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Let  i  —  1  be  the  base  case.  Since  J-  is  a  ©-distribution  of  F,  then  by  lemma 
9,  {(p,  FPti)}pe_\f  is  a  ©-distribution  of  F\ . 

Now  as  the  inductive  step,  assume  that  for  any  i,  {(p,  Fpp)}pe_\f  is  a  ©- 
distribution  of  ©j.  Then  by  lemma  9,  {(p,  ©P,i+i)}PeAT  is  a  ©-distribution  of  Fi+1. 

So  for  all  i  >  1,  {(p,  FPji)}p£j^  is  a  ©-distribution  of  F which  by  definition  38 
implies  that  for  i  >  1,  F,  =  [jpej^FP:i.  □ 

Corollary  11.  In  addition  to  the  conditions  of  theorem  10,  if  U  terminates,  then 
fi  is  weakly  V-parallel. 

Proof.  Follows  directly  from  theorem  10  and  proposition  3.  □ 

Turning  to  the  possibility  of  inference  being  strongly  ©-parallel,  the  following 
conjectures  are  given.  These  are  presented  as  conjectures  rather  than  complete 
theorems  because  their  proofs  are  sketches.  Providing  complete  proofs  remains  as 
future  work. 

Conjecture  12.  Let  R  be  a  ruleset,  let  S  be  a  AOC,  and  let  ©  =  (J\f,<f>,0)  be 
a  distribution  scheme.  If  every  rule  instance  of  a  rule  in  R  is  V-matchable  and 
V-blockable,  then  for  any  V- distribution  J7  of  a  factset  F,  {S(*,  R,iF(p))}p£j^  in¬ 
terleaves  to  S(*,R,F). 

Proof,  (sketch)  By  lemma  5,  conf(R,F )  =  (JpeA/- con/(©,  ©(p)).  Then,  by  defini¬ 
tion  47,  since  S  selects  all  the  rule  instances  in  the  conflict  set  and  orders  them 
according  to  a  total  ordering  of  rule  instances,  then  {S(*,  R,  iF(p))}p£_\r  interleaves 
to  S(*,R,F).  □ 

Conjecture  13.  Let  R  be  a  polarized  ruleset,  let  S  be  a  RAOC,  let  X  be  any  in¬ 
formation  keeper,  and  let  ©  =  (Af,  (/>,  0)  be  a  distribution  scheme.  If  every  instance 
of  a  rule  in  R  is  V-matchable,  V-blockable,  V -retractable,  and  V-preserving;  then 
program  fi  =  (X,  S,  R)  is  strongly  V -parallel. 

Proof,  (sketch)  By  theorem  10,  fi  is  cyclically  parallel.  Recall  from  the  proof  of 
theorem  10  (and  using  the  same  notation),  for  any  i  >  1,  {(p,  FPji)}pe is  a  ©- 
distribution  of  Ft.  Then  by  conjecture  12,  in  every  individual  cycle,  the  sequences 
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of  rule  instances  fired  in  parallel  interleave  to  the  sequence  of  rule  instances  fired  in 
sequential.  Therefore,  chaining  the  parallel  sequences  together  over  the  cycles  and 
chaining  the  sequential  sequences  together  over  the  cycles,  the  parallel  sequences 
interleave  to  the  sequential  sequence.  □ 

3.3.2  Conditions  on  Rules 

The  theorems  of  the  previous  section  are  only  the  starting  points.  In  and 
of  themselves,  they  are  not  very  useful  because  the  conditions  are  placed  ground 
rules  rather  than  (general)  rules.  Given  a  large  enough  domain  and  number  of  rules 
(which  would  not  seem  to  require  many),  checking  each  individual  rule  instance 
becomes  utterly  impractical.  Therefore  in  this  section,  I  generalize  the  theorems  of 
the  previous  section  to  rules.  First,  though,  a  notion  of  “pattern”  is  needed. 

Definition  55.  A  restriction  is  a  negated  equality  formula  Not(u  =  t)  where  v  is 
a  non-ground  term  (a  variable  or  a  function  term  containing  variables)  and  t  is  a 
ground  term.  A  restriction  x  =  Not(r?  =  f)  is  said  to  restrict  a  formula  /  iff  v 
occurs  in  /. 

Definition  56.  A  pattern  is  a  conjunction  formula  And (/  x\  . . .  xn )  where  /  is 
an  atomic  formula,  n  >  0,  and  each  xt  for  1  <  i  <  n  is  a  restriction  that  restricts  /. 

The  notion  of  a  restriction  is  rather  simple.  It  merely  states  that  a  variable 
cannot  be  bound  to  some  specific  value.  A  pattern  is  then  just  an  atomic  formula 
with  an  associated  set  of  restrictions.  This  particular  idea  of  “pattern”  is  important 
because  it  will  facilitate  the  ability  to  restrict  rules  to  improve  parallelism,  discussed 
in  chapter  4. 

Notation.  For  any  pattern  P,  let  T(P)  =  {/  |  P  matches  Fj  U  {/}}  where  Fj  is 
the  set  of  all  independent  facts  subsumed  by  every  factset.  A  fact  f  is  said  to  be 
matched  by  a  pattern  P  iff  f  G  T(P). 

Although  briefly  introduced,  this  particular  notation  is  of  great  importance, 
and  its  full  meaning  should  be  well  understood.  For  a  pattern  P,  T(P)  repre¬ 
sents  the  set  of  all  facts  that  are  instances  of  the  atomic  formula  in  P  satisfy- 
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ing  the  associated  restrictions  of  P.  Note  that  although  according  to  the  op¬ 
erational  semantics,  a  pattern  P  =  And(_a[_b->?x]  Not(?x=_c))  can  match  a 
factset  {_a[_b->_d]  ,  _e=_f}  U  Fj,  only  _a[_b->_d]  G  T(P)  because  P  can  match 
{_a[_b->_d]  }  U  Fj  but  not  {_e=_f }  U  Fj. 

More  notation  is  defined  in  the  following,  but  essentially  X(f)  is  the  set  of 
restrictions  occurring  in  a  formula  /,  and  P«t(r),  Vc(r),  Pn(r)>  and  Pn(r)  are  the 
sets  of  patterns  derived  from  C+(r),  C"(r),  M+(r),  and  A~"(r),  respectively,  for  any 
rule  r. 

Notation.  For  a  condition  formula  f,  letX(f)  denote  the  set  of  restrictions  defined 
as  follows: 

•  if  f  is  an  atomic  formula,  X(f)  =  0; 

•  if  f  is  a  negated  formula  that  is  not  a  restriction,  X{f)  =  0; 

•  if  f  is  a  restriction,  X{f)  =  {/}; 

•  if  f  is  a  conjunction  And(f\  . . .  fn),  then  X(f)  =  1J!:Li  X(fi). 

Notation.  For  a  condition  formula  f, 

•  P  E  Vc(f)  iff  P  =  And(f  x\  . . .  xn)  for  some  f  G  C+(/)  where  {xi]f=l  is 
the  maximum  subset  of  X(  f  )  such  that  each  xt  restricts  f; 

•  P  G  Vc(f)  iff  P  =  And(f  X\  ...  xn)  for  some  f  G  C_’(/)  where  {xi}f=1  is 
the  maximum  subset  of  X(  f  )  such  that  each  Xi  restricts  f. 

Notation.  For  a  rule  r  —  If  f  Then  a: 

•  n(r)  =  V*U); 

•  Vc(r)  =  VB(f); 

•  P  G  V\{r)  iff  P  =  And(g  X\  ...  xm)  for  some  g  G  M+(r)  and  {xj}7^  is  the 
maximum  subset  of  X(f)  such  that  each  x3  restricts  g; 

•  P  G  Vj(r)  iff  P  =  And(g  X\  ...  xm)  for  some  g  G  A~"(r)  and  {xj}™=1  is  the 
maximum  subset  of  X(  f  )  such  that  each  x3  restricts  g. 
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Notation.  For  a  rule  r,  let  A (r)  denote  the  set  of  all  matchable  rule  instances  of  r. 

The  following  lemma  proves  what  is  rather  intuitive,  that  the  facts  in  a  rule 
instance  match  patterns  in  the  rule  from  which  the  rule  instance  is  derived.  However, 
to  be  thorough  and  to  ensure  consistency  in  the  notation,  proving  the  lemma  is  a 
necessary  step. 

Lemma  14.  For  any  rule  r,  the  following  hold: 

•  l W)C+(p)  —  U  Pevf(r)  r  in- 

•  UpeA(r)  ^  (p)  —  UpePgp)  r(-P); 

*  UpeA(r)  A*(p)  e  iw»r(n- 

#  UpeA(r)  ^  (p)  —  Upe^wr(P). 

Proof.  If  /  G  UpeA(r)  C+(p),  then  there  exists  p  G  A(r)  such  that  /  G  C+(p).  By 
definition  22,  this  means  that  there  exists  f  G  C+(r)  and  ground  substitution  a 
such  that  /  =  a (/').  Since  /'  G  C+(r),  this  means  that  there  exists  a  pattern  P  = 
And (/'  . . .)  G  Pfi'r).  Now  if  P  contains  any  restrictions  on  /',  then  by  definition 
of  Pq(t),  such  restrictions  must  also  be  in  tT(r).  Since  p  is  matchable,  then  a  is 
such  that  all  of  the  restrictions  x  on  f  are  such  that  cr(x)  =  Not(ti  =  £2)  where 
1 1  and  £2  are  different  ground  terms.  Therefore,  the  ground  formula  u(P)  matches 
{cr(/')}  U  Fj  =  {/}  U  Fj  (where  Fj  is  the  set  of  independent  facts  subsumed  by 
every  factset),  which  means  that  since  such  a  a  exists,  P  matches  {/}  U  P/,  and  so 
/  G  T(P).  Similar  arguments  for  C"(p),  Al+(p),  and  AH(p)  with  P<4(r),  V^(r),  and 
Vfir),  respectively.  □ 

At  this  point,  definitions  are  introduced  that  will  help  to  generalize  away  from 
distribution  schemes  that  assign  facts  to  processors,  to  pattern  assignments  that 
assign  facts  to  processors  based  on  which  patterns  match  them.  In  other  words, 
data  distribution  is  no  longer  performed  on  a  fact-by-fact  basis  but  a  pattern-by¬ 
pattern  basis.  This  is  significantly  more  practical  because  no  person  will  likely  ever 
decide  for  each  individual,  possible  fact,  to  which  processors  the  fact  should  be 
assigned  and  allowed.  However,  moving  away  from  facts  toward  patterns  entails 
additional  complexity,  as  will  be  discussed  as  definitions  are  given. 
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Definition  57.  A  pattern  mapper  <3>y  is  a  total  function  from  patterns  to  subsets 
of  J\f  =  where  n  is  some  non-negative  integer. 

A  pattern  mapper  is  somewhat  analogous  to  the  in  a  distribution  scheme, 
thus  the  similarity  in  notation.  However,  it  maps  patterns  to  processors  instead  of 
facts  to  processors. 

Definition  58.  A  pattern  assignment  is  a  triple  of  pattern  mappers  ($/v,  ©a r) 

such  that  the  following  hold  for  any  pattern  P : 

•  4w(P)  c  nM(P)  c  Qm(P); 

•  for  any  pattern  P'  such  that  T(P')  C  T(P), 

-  *x(P)  C  r(P'), 

-  nM(p')  c  nM(P), 

-  ©a AP’)  c  ©a r(P). 

A  pattern  assignment  consists  of  three  pattern  mappers  and  forces  coherent 
relationships  between  them.  (-fv(P)  is  a  set  of  processors  that  could  be  allowed  to 
have  facts  in  T(P).  $a y(P)  can  be  thought  of  as  a  set  of  processors  where  facts 
in  T(P)  can  definitely  be  found  (if  they  exist  in  the  distributed  factset),  whereas 
fi/v^P)  is  the  set  of  processors  to  which  facts  in  T(P)  ought  to  be  inferred.  This 
relationship  between  patterns  and  facts  is  made  explicit  in  definition  59. 

Definition  59.  A  distribution  scheme  T>  =  (J\T,  (f>,  9)  is  said  to  conform  to  a  pattern 
assignment  (4>a/',  Qjv)  iff  f°r  any  pattern  P,  the  following  hold: 

.  w)cnJEr(P1« 

•  U/er(P)  ^(/)  —  ^a/-(P); 

•  U/er  {P)0(f)ceM(P). 

When  dealing  with  only  ground  rules,  a  distinction  like  that  between  and 
fi/V  is  unnecessary  because  the  level  of  granularity  is  finer.  With  patterns,  though, 
one  can  no  longer  say  that  one  rule  instance  of  r  will  match  on  processor  i  while 
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another  rule  instance  of  r  will  match  on  a  different  processor  j  because  we  are  no 
longer  working  at  that  level  of  detail.  Now  it  must  be  said  that  all  the  rule  instances 
of  a  rule  r  will  match  on  some  processor (s),  and  all  the  rule  instances  of  a  rule  r 
should  infer  to  some  processor(s).  Thus,  precision  is  being  lost,  and  the  conditions 
are  moving  farther  away  from  being  necessary  (although  I  have  not  proven  any  of 
the  conditions  to  be  necessary  but  only  sufficient).  In  other  words,  as  will  be  shown, 
when  the  conditions  for  rules  (to  be  given)  are  met,  the  conditions  for  rule  instances 
are  also  met,  but  not  vice  versa. 

As  mentioned,  the  conditions  of  definition  58  enforce  some  basic  coherence.  For 
example,  processors  that  are  guaranteed  to  have  all  facts  in  T(P)  are  also  processors 
to  which  all  facts  in  T(P)  must  be  inferred,  and  all  processors  to  which  all  facts  in 
T(P)  must  be  inferred  are  also  processors  that  are  allowed  to  have  facts  in  T(P). 
Additionally,  the  pattern  mappers  must  be  conscious  of  the  relationship  between 
patterns.  This  is  another  difference  in  dealing  with  patterns  instead  of  directly  with 
facts  is  that  there  are  natural  relationships  between  patterns  (e.g.,  T(Pi)  C  T(P2)). 

The  next  three  lemmas  establish  some  important  relationships  between  pattern 
assignments  and  distribution  schemes  that  will  be  useful  in  proving  the  sufficient 
conditions  for  parallel  inference  with  rules.  In  and  of  themselves,  they  are  not 
particularly  interesting  to  the  overall  rhetoric. 

Lemma  15.  For  any  distribution  scheme  V  =  (J\f,  (j),  6)  that  conforms  to  a  pattern 
assignment  ($/v,  Qjg,  ©a r),  for  any  rule  r,  for  any  p  G  A (r): 

•  fl Perf(r)  ®n(P)  <=  Pl/eC+(p)  0(f)/ 

•  D  PeP£(r)  ^(P)Qnfec^P)0(f); 

•  flpeP+p)  ®n(P)  ^  OfeA+(p)0(f)’ 

•  HpeP^(r)  M^erw^C/). 

Proof,  k  G  Hpep+Cr)  ^*A r(P)  means  that  for  any  P  G  Vf  (r),  k  G  $v(P).  Then  by 
definition  59,  for  any  P  G  Pft(r),  k  G  H/er(P)  0(f)-  So  k  G  HlpeP^fr)  ri/er(P)  0(f)- 
Now  consider  the  /  over  which  intersection  is  occurring.  It  is  for  all 
/  G  Upep+(r)r(P)-  %  lemma  14,  UpSA(r)C+(p)  S  UpeP+(r)  r(P)-  Therefore, 
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intersecting  over  /  G  UpeA(r)  B+(p)  wdl  be  110  more  restrictive  and  produce  a  super¬ 
set.  Hence,  f| pep+(r)  D/er(P)  0(/)  £  flpeA(r)  fl/ec+(p)  0(/)-  Then  for  any  P*  e  A('r), 
it  trivially  holds  that  f|peA(r)  fl/ec+(p)  </>(/)  ^  fl/ec+(p*)  </>(/)■  By  transitivity,  for 
any  p*  G  A(r),  f|pep+(r)  ®AP)  Q  fl/ec+(p*)  ^(/)-  Similar  arguments  for  (T(p), 
*4.+  (p),  and  Aid  (p)  with  Vff{r),  V\ (r),  and  V2(r),  respectively.  □ 

Lemma  16.  For  anp  distribution  scheme  V  =  (A/”,  (j),  6)  that  conforms  to  a  pattern 
assignment  (<Fyv,  fl/v,  Qjv)  ,  for  any  rule  r,  for  any  p  G  A(r): 

*  U/eC+(p)  ^(/)  —  UpeP^(r) 

*  U/eC-(p)  ^(/)  —  Upepy(r) 

*  U/e^+(p)  <K/)  —  UpeP^h) 

*  U/e^(p)  ^(/)  —  UpePy(r)  ^At(F). 

Proof.  For  any  rule  instance  p*  G  A(r),  it  holds  that  U/ec+(p*)  0(/)  — 

UpeA(r)  Ufec+(p)  <K/)-  Now  consider  the  /  over  which  union  is  occurring.  It  is 
for  all  /  G  UpeAW^P)-  %  lemma  14,  UpeA P)C+(p)  £  Upep+(r)  r(P)-  There¬ 
fore,  union  over  /  G  Upep+p)  T(F)  will  be  no  less  inclusive  and  produce  a  su¬ 
perset.  Hence,  UPeAp)  U/ec+(p)  <K/)  ^  Upep+(r)  U/er(P)  0(/)>  and  by  definition  59, 
Up6p+(r)  U/er(P)  <K/)  T  UpeP+p)  Similar  arguments  for  fT(p),  Al+(p),  and 

A^{p)  with  Vfir),  Vf  (r),  and  Vf{r).  respectively.  □ 

Lemma  17.  For  anp  distribution  scheme  V  =  (A/",  f>,  0)  that  conforms  to  a  pattern 
assignment  ($/p,  Qjg,  Qjg),  for  any  rule  r,  for  any  p  G  A(r): 

*  U/eC+(p)  ^(/)  —  UpeP^(r)  ®A/(F); 

*  U/eC-(p)  ^(/)  —  Upepyp)  ®a/(F); 

*  U/e^t+(p)  ^(/)  —  Upep+p)  ®aK-P); 

*  U/e^(p)  @(f)  —  UpeP^(r)  ®At(-P). 


Proof.  Similar  arguments  as  for  lemma  16. 


□ 
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The  following  four  lemmas  give  the  sufficient  conditions  for  correct  parallel 
inference  with  rules.  The  goal  here  is  to  determine  conditions  on  the  rules  such  that 
something  can  be  said  about  all  the  rule  instances  of  those  rules  and  then  tie  the 
conclusions  into  previously  stated  theorems  about  rule  instances.  The  first  lemma 
proves  conditions  on  a  rule  that  are  sufficient  for  showing  that  all  the  rule  instances 
of  that  rule  are  'D-matchable.  The  second,  third,  and  fourth  lemmas  show  the  same 
for  77-blockable,  77-retractable,  and  D-preserving,  respectively. 

Lemma  18.  Let  r  be  a  rule,  and  let  V  =  (A/",  0)  be  a  distribution  scheme  that 

conforms  to  pattern  assignment  (<h/v,  ^ag  ©a/-)-  U  one  of  the  following  holds: 

*  Pi PGVf(r)  0?  or 

*  V Q£Vf(r)  /\p£Tf(r)\{Q}[®M'(P)  =  N]i 

then  every  p  G  A(r)  is  V-matchable. 

Proof.  If  Hpe-p+fr)  ®n(P)  0,  then  by  lemma  15,  for  any  p  G  A(r),  since  0  C 

Pi Pev+(r)  ^m(P)  Pi /ec+(p)  <A/)>  then  fl/ec+(p)  ^(/)  7^  0- 

If  VQep+(r)  Apep+(r)\{Q}^(?)  =  ^  then  fOT  SOme  Q  e  Pc(r ). 

flpeP^(r)\{Q}  ®J\f(P)  —  N .  By  definition  of  V^(r),  this  means  that  for  any  p  G 
A(r),  there  exists  at  most  one  g  G  C+(p)  such  that  4>{g)  A f,  which  means 

A fec+(p)\{s}[^(/)  =  Therefore,  for  any  p  G  A(r),  the  conditions  of  dehnition  45 
hold  true,  and  p  is  D-matchable.  □ 

Lemma  19.  Let  r  be  a  rule,  and  let  V  =  (A/",  0)  be  a  distribution  scheme  that 

conforms  to  pattern  assignment  (<h/v,  ^ag  ©a/-)-  If  one  °f  the  following  holds: 

•  Vq(t)  =  0,  or 

•  U pep+(r)  ©A r(P)  <=  D pep-(r)  ^*A f(P)> 

then  every  p  G  A(r)  is  V-blockable. 

Proof.  If  Vcfr)  =  0,  then  for  any  p  G  A(r),  C^{p)  =  0. 

If  Upep+(r)  ©A r(P)  Q  flpepy(r)  ^At(-P),  then  for  any  p  G  A(r),  lemma  17  im¬ 
plies  that  U/eC+(p)  A/)  Upep+(r)  ©aA^P),  and  lemma  15  implies  that 
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ripep5(r)  $aKP)  Q  By  transitivity,  U fec+(P)d(f)  £  D /ec-(P)<K/)- 

Note  that  f) feC+{p)0(f)  C  U/ec+(P)  #(/)>  so  father  holds  that  n,fec+(P)d(/)  £ 
n/6C.w  <K/)  for  any  p  G  A(r)  Therefore,  for  any  p  G  A(r),  the  conditions  of  dehni- 
tion  46  hold  true,  and  p  is  77-blockable.  □ 

Lemma  20.  Let  r  be  a  rule,  arid  let  V  =  (A/",  0,  0)  be  a  distribution  scheme  that 
conforms  to  pattern  assignment  ($y,  fW>  ©a/-)  which  cover  the  condition  formula  of 
r.  If  Upep-(r)  ®m(P)  ^  flpgp+p)  then  ever?/  p  G  A(r)  ts  V -retractable. 

Proof  If  Upep^(r)  ©aKp)  c  fl pep+(r)$A^(P),  then  by  lemma  17,  U feA-(P)9(f)  ^ 
Upep-(r)  ©aKp),  and  by  lemma  15,  Dpep+(r)  ®^(p)  Q  fl/ec+(P)  </>(/)•  By  transitiv¬ 
ity,  for  any  p  G  A(r),  U/e^(P)d(/)  £  fl/ec+(P)  0(/)-  Therefore,  for  any  p  G  A(r), 
the  condition  of  definition  52  holds  true,  and  p  is  P-retractable.  □ 

Lemma  21.  Let  r  be  a  rule,  and  let  V  =  (A f,  0, 6)  be  a  distribution  scheme  that  con¬ 
forms  to  pattern  assignments  ($y,  fly,  0y ) .  If\JPeV+(r}  r(P)  Q  HpeP+p)  $A r(P), 
then  every  p  G  A(r)  is  V -preserving. 

Proof.  If  Upep+(r)  n^(P)  C  f|pep+(r)  $aKp)>  then  by  lemma  16,  U/G^+(P)  0(/)  £ 
Upep+(r)  ^m(P),  and  by  lemma  15,  f|pep+(r)  ®m(P)  Q  fl/ec+(P)  </>(/)•  %  transitiv¬ 
ity,  for  any  p  G  A (r),  U/e^+(P)  </>(/)  ^  fl/ec+(P)  <£(/)■  Therefore,  for  any  p  G  A(r), 
the  condition  of  definition  53  holds  true,  and  p  is  "D-preserving.  □ 

Finally,  it  can  be  said  that  sufficient  conditions  on  rules  have  been  proven  for 
correct  parallel  inference,  and  this  section  ends  with  that  very  corollary. 

Corollary  22.  Let  R  be  a  polarized  ruleset,  let  S  be  a  RAOC,  let  1  be  any  in¬ 
formation  keeper,  and  let  (<Fy,  fly,  ©at)  be  a  pattern  assignment.  If  the  condi¬ 
tions  of  lemmas  18,  19,  20,  and  21  are  met  for  every  rule  r  G  R,  then  program 
fl  =  (I,  Pi fiX,  S,  R)  is  cyclically  V-parallel  where  V  is  any  distribution  scheme  that 
conforms  to  (<hy,  fly,  ©a/-). 


Proof.  Follows  immediately  from  lemmas  18,  19,  20,  and  21;  and  theorem  10.  □ 
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3.4  Summary 

This  chapter  defined  the  syntax  and  mathematical  notation  for  rules  in  section 
3.1.1  and  defined  an  operational  semantics  in  section  3.1.2.  In  section  3.2,  defini¬ 
tions  and  operational  semantics  were  given  for  parallel  inference,  and  section  3.2.1 
provided  definitions  for  what  is  means  for  parallel  inference  to  be  correct.  Definition 
42  is  the  definition  for  correctness  relied  upon  for  the  remainder  of  this  thesis. 

Section  3.3  contained  the  significantly  novel  contribution  of  this  chapter.  In 
section  3.3.1,  sufficient  conditions  are  determined  for  ground  rules  such  that,  when 
the  conditions  are  met  for  every  possible  ground  rule  in  inference,  parallel  inference 
is  guaranteed  to  be  correct  relative  to  a  distribution  scheme.  To  be  more  useful, 
though,  the  conditions  are  generalized  to  rules  (ground  or  otherwise)  in  section  3.3.2. 
These  conditions  provide  the  foundation  for  the  findings  of  the  following  chapter. 


CHAPTER  4 

PRACTICAL  APPLICATION  OF  CONDITIONS  FOR 
PARALLEL  INFERENCE 

In  this  chapter,  the  sufficient  conditions  determined  in  the  previous  chapter  are  used 
to  derive  a  method  to  restrict  (polarized)  rulesets  such  that  parallel  inference  with 
the  restricted  version  of  the  ruleset  (with  a  RAOC)  is  correct.  In  section  4.1,  a 
special  class  of  distribution  schemes  -  called  replication  schemes  -  is  considered, 
and  new,  simpler  notation  is  defined.  The  sufficient  conditions  from  the  previous 
chapter  are  then  recast  into  the  simpler  notation,  which  reveals  a  possible  reduc¬ 
tion  to  satisfiability.  This  possibility  is  further  explored  and  confirmed  in  section 
4.2.  Specifically,  testing  sufficient  conditions  for  correct  parallel  inference  with  a 
replication  scheme  is  reducible  to  2SAT.  Then,  the  2SAT  reduction  is  augmented 
to  a  3SAT  reduction  that  allows  for  the  option  to  sacrifice  rules  in  order  to  improve 
parallel  inference.  The  problem  arises,  then,  that  the  search  space  for  solutions  to 
the  3SAT  formula  becomes  quite  large  for  even  moderately  sized  rulesets.  Therefore, 
a  methodology  is  proposed  for  reducing  the  search  space  in  section  4.2.3,  and  it  is 
applied  to  restrict  the  RDFS  and  OWL2RL  rulesets  in  section  4.3. 

4.1  Replication  Schemes 

In  this  section,  a  specific  class  of  distribution  schemes  is  introduced  called 
replication  schemes.  General  distribution  schemes  are  difficult  to  manage  because 
they  require  that,  for  every  processor,  it  must  be  decided  whether  it  is  allowed  to 
have  a  given  fact,  and  if  so,  whether  it  must  have  that  fact.  Although  pattern 
assignments  allow  assignment  of  facts  by  looking  at  a  finite  set  of  patterns,  this 
actually  complicates  the  process,  even  though  it  makes  it  more  tractable.  That  is, 
not  every  possible  fact  needs  to  be  considered  (of  which  there  could  be  infinitely 
many),  but  for  each  processor,  it  must  be  decided  whether  the  processor  is  allowed 
to  have  facts  matched  by  the  pattern;  and  if  so,  whether  inferences  matched  by  the 
pattern  must  go  to  that  processor;  and  if  so,  whether  it  should  be  guaranteed  that 
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the  processor  has  such  facts.  Additionally,  the  pattern  assignment  must  be  checked 
for  validity  as  given  by  the  conditions  in  definition  58. 

A  simpler  form  of  distribution  scheme  which  has  previously  been  useful  in 
[20,  21]  is  to  restrict  fact  assignment  to  two  possibilities:  for  any  fact,  either  replicate 
it  to  all  processors,  or  place  it  arbitrarily  to  some  processor (s).  Definition  60  captures 
this  notion.  Note  that  even  though  it  is  possible  that  for  a  fact  /,  0(/)  =  0,  it 
must  still  hold  that  /  is  placed  at  some  processor  when  distributed  because  the 
union  across  the  processors’  factsets  must  equal  the  faetset  prior  to  distribution  (by 
definition  38). 

Definition  60.  A  replication  scheme  is  a  distribution  scheme  1Z  =  (. Af ,  q b,  9)  such 
that  for  any  fact  /: 

•  <Hf)  =  0  or  <j>(f)  =  M; 

.  0(f)  =  AT. 

Definition  61.  A  pattern  replicator  is  a  pattern  assignment  ($a/-,  ©a/")  such 

that  for  any  pattern  P : 

•  <bN{P)  =  0  or  3w(P)  =  JV; 

•  i v(p)  =  0  or  n„(p)  =  a r- 

•  0Ar(P)=  AT 

At  this  point,  it  is  useful  to  recast  previous  theorems  and  definitions  in  terms  of 
replication  schemes,  and  these  new  theorems  and  definitions  will  provide  the  basis 
for  the  main  findings  with  regard  to  replication  schemes  and  pattern  replicators. 
Lemma  23  builds  on  definition  59  to  prove  what  it  means  for  a  replication  scheme 
to  conform  to  a  pattern  replicator. 

The  gist  of  lemma  23  is  as  follows.  If  some  processor  must  guarantee  that  it 
has  facts  matched  by  a  pattern,  then  that  guarantee  is  made  for  all  processors.  If 
inferences  matched  by  a  pattern  need  not  be  inferred  to  any  particular  processor, 
then  no  guarantee  is  made  about  the  particular  placement  of  such  facts.  Also,  facts 
matched  by  any  pattern  are  allowed  to  be  placed  at  any  processor.  These  are  proven 
to  hold  true  for  any  replication  scheme  conforming  to  a  pattern  replicator. 
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Lemma  23.  A  replication  scheme  1Z  =  (Af,  <f>,  6)  conforms  to  a  pattern  replicator 
(&jv,  fi/v,  ©At)  iff  the  following  hold  for  any  pattern  P: 

•  if  <h/v-(P)  =  Af,  then  for  all  f  E  T(P),  0(/)  =  Af; 

•  if  Qj^(P)  =  0,  then  for  all  f  E  T(P),  4>(f)  =  0; 

.  0Ar(P)=  Af. 

Proof.  (— >)  Assume  that  1Z  conforms  to  (<l>At,  r,  ©At)- 

By  definition  59,  it  must  hold  for  any  pattern  P  that  <h/v-(P)  C  P|^gr^p,)  <f>(f).  If 
<h_v(P)  =  Af,  then  Hfeiyp)  0(/)  =  A/”,  which  means  that  for  all  /  E  r(P),  <p(f)  =  Af. 

By  definition  59,  it  must  hold  for  any  pattern  P  that  U/er(P)  Q  ^a t(P)- 
For  any  pattern  P,  if  fljy(P)  =  0)  then  lJ/er(P)  0(/)  =  0,  which  means  that  for  all 

/  e  r(P),  </>(/)  =  0. 

By  definition  59,  it  must  hold  for  any  pattern  P  that  Ufer(P)  $(/)  ©At(-P)- 

By  definition  60,  for  any  fact  f,  9(f)  =  Af.  This  means  that  Af  =  Ufer(P)  $(/) 

©At(P)- 

(t— )  Proof  by  contradiction.  Assume  that  the  three  bulleted  conditions  hold 
true  but  1Z  does  not  conform  to  pattern  replicator  (<Fv,  Rv,  ©At)- 

If  $j^(P)  =  0,  then  it  is  trivially  true  that  <I>At(P)  C  P|/6r(p)  0(/)-  ^ 

f(P)  —  A/",  then  H/er(P)  <K/)  =  A/”,  and  so  it  is  again  trivially  true  that  &jy(P)  C 
fl/er(P)  0(/)-  Therefore,  if  1Z  does  not  conform  to  (<F_v,  @_v),  it  is  not  because 

«W(P)  ^  n/£r<p)  ■#•(/)' 

If  =  Af,  this  it  is  trivially  true  that  (J^gr(P^  4>{f)  Q  ^a f(P)-  If  Cljg(P)  = 

0,  then  =  0,  and  again,  it  is  trivially  true  that  IJ/Gr(P)  0(/)  —  ^At(-P)- 

Therefore,  if  7Z  does  not  conform  to  (t&v,  fljg,  ©a/-),  it  is  not  because  (J/er(P)  0(/)  ^ 
^At(P)- 

Since  Ojy(P)  =  A/",  it  is  trivially  true  that  Ufer(P)  ^(/)  ©At(-P)-  Therefore, 

if  7Z  does  not  conform  to  (t&v,  Qjg,  ©At),  it  is  not  because  lJ/Gr(P)  $(/)  ^  ©At(-P)- 
Therefore,  it  cannot  be  that  1Z  does  not  conform  to  (Qjy,  ^At,  ©At),  which  is  a 
contradiction.  □ 


Notation.  Now  that  the  choices  are  either  between  0  and  Af,  there  is  no  longer  a 
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need  for  the  complexity  in  notation  brought  by  working  with  subsets  of  A f .  For  a 
replication  scheme  77  =  (A f,  f>,  9),  let  71(f)  =  [4>(f)  =  A f]  and  ->7 7(/)  =  [<p(f)  =  0]. 

Let  a  pattern  replicator  be  denoted  as  a  pair  (e,  a)  where  e(P)  =  [fijv(P)  =  0], 
MP)  =  [«A r(P)  =  A 7),  a(P)  =  [^(P)  =  A f],  and  ^a(P)  =  [<MP)  =  0]. 

Corollary  24  is  a  direct  recasting  of  lemma  23,  illustrating  the  neatness  of  the 
new  notation  and  justifying  the  intuition  of  the  notation.  The  formulas  in  corollary 
24  seem  self-justifying.  That  is,  ce(P)  implies  that  all  the  facts  in  T(P)  are  replicated 
to  all  processors,  and  e(P)  implies  that  all  the  facts  in  T(P)  are  placed  arbitrarily. 
Note,  though,  that  it  is  possible  that  for  a  pattern  P,  ->a(P)  A  -ie(P).  That  is, 
patterns  do  not  have  to  fall  into  one  of  these  two  categories,  although  intuitively,  a 
pattern  cannot  be  in  both  categories. 

Corollary  24.  A  replication  scheme  71  conforms  to  pattern  replicator  (e,  a)  iff  the 
following  hold  for  any  pattern  P: 

•  ot(P)  — >  A/er(P)  'P'(f); 

•  e(P)  A/er(P)  AP-l/). 

Proof.  Straightforward  rewriting  of  lemma  23  using  the  new  notation.  □ 

Finally,  the  previous  lemma  and  corollary  can  be  used  to  prove  sufficient  con¬ 
ditions  for  parallel  inference  with  replication  schemes  (with  polarized  rulcsets  and 
RAOCs)  following  quickly  from  lemmas  18,  19,  20,  and  21.  To  briefly  summarize 
the  corollaries,  let  7Z  be  a  replication  scheme  that  conforms  to  pattern  assignment 
(e,  a).  Corollary  25  says  that  if  all  but  at  most  one  pattern  in  a  rule  condition  have 
their  facts  replicated,  then  all  the  instances  of  the  rule  are  P-matchable.  Corollary 
26  states  that  if  all  facts  matched  by  patterns  corresponding  to  negated  formulas  in 
a  rule  have  their  facts  replicated,  then  every  instance  of  that  rule  is  P-blockable. 
Corollary  27  says  that  if  a  rule  with  retract  actions  has  all  the  facts  matching  pat¬ 
terns  in  its  condition  replicated,  then  every  instance  of  the  rule  is  77-retractable.  (A 
rule  without  retract  actions  is  inherently  77-retractable.)  Corollary  28  says  that  if 
a  rule  is  such  that  either  all  the  facts  it  can  infer  can  be  placed  arbitrarily  or  all 
the  facts  matched  by  its  condition  are  replicated,  then  every  instance  of  the  rule  is 
77-preserving. 
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Corollary  25.  Let  r  be  a  rule,  and  let  77  be  a  replication  scheme  that  conforms 
to  pattern  replicator  (e,  a).  If  Apep+(r)\{Q}  aip)>  t/jen  eveH/  P  G  A(r)  is 

IZ-matchable. 

Proof.  Let  (e,  a)  correspond  to  ('La/';  fly?  ©a/")  as  described  in  the  notation.  If 

y Qevf(r) /\pevf{r)\{Q}a^)^  then  \/ Qep+(r)  l\p&vf(r)\{Q}  ®n(p)  =  A/",  which  by 
lemma  18  means  that  every  p  G  A(r)  is  77-matchable.  □ 

Corollary  26.  Let  r  be  a  rule,  and  let  IZ  be  a  replication  scheme  that  conforms  to 
pattern  replicator  (e,  a).  If  f\PeT-.^a(P),  then  every  p  G  A(r)  is  TZ-blockable. 

Proof.  Let  (e,  a)  correspond  to  {Qjy,  Cljy,  ©at)  as  described  in  the  notation.  If 
/\peVc(r)a(P)’  then  ApeP^(r-)  =  A f.  Recall  from  lemma  23  that  for  any 

pattern  P,  ©a r(P)  =  A f.  This  means  that  Upep+p)  ©aA-H  C  flpep^p)  ^a r(P), 
which  by  lemma  19  means  that  every  p  €  A(r)  is  77-blockable.  □ 

Corollary  27.  Let  r  be  a  rule,  and  let  IZ  be  a  replication  scheme  that  conforms  to 
pattern  replicator  (e,a).  If  V2(r)  =  0  or  Apep+p)  <a(P),  then  every  p  G  A(r)  is 
IZ-retractable. 

Proof.  Let  (e,  a)  correspond  to  (‘Ly,  ^AG  ©a r)  as  described  in  the  notation.  If 
Vf{r)  =  0,  then  it  is  trivially  true  that  UpeP^p)  ©aA-P)  C  HpeP^p)  ‘I’aA-P)  since 
Upe0  ©A r(P)  =  0-  If  Apep+P)  a(p)i  then  Apep+(r)  'Ly(P)  =  Af.  Recall  from  lemma 
23  that  for  any  pattern  P,  Q^(P)  =  Af.  This  means  that  (Jpep^p)  ©jV(P)  Q 
fl pep+(r)  ‘I* a r(P),  which  by  lemma  20  means  that  every  p  G  A(r)  is  77- retractable. 

□ 

Corollary  28.  Let  r  be  a  rule,  and  let  77  be  a  replication  scheme  that  conforms 
to  pattern  replicator  (e,a).  If  [\f  PgV+^ ->e(P)]  — *  [A Perf(r)a(p)\>  then  every  p  G 
A(r)  is  IZ -preserving. 

Proof.  Let  (e,  a)  correspond  to  ($aa  kljy,  ©a/)  as  described  in  the  notation.  If 

[Vpepj’(r)  _,e(A>)]  ~ ^  [A PePj(r) '^(A*)];  then  [Vpep+p)  ^W(-P)  =  A/]  —>■ 
[Apep+p)  ^at(P)  =  Af],  which  is  equivalently  stated  that  [(Jpep+p)  ^aA-P)  =  A/]  — > 
[fl PePjp)  <Ibv(A>)  =  TV],  which  also  means  (taking  into  consideration  definition  61) 


55 


UPe-p+(r)  Rv(T’)  C  ripep+(r)  Then  by  lemma  21,  every  p  e  A(r)  is  7£- 

preserving.  □ 

These  conditions,  particularly  those  placed  on  negation  and  retraction,  are 
quite  restrictive.  Starting  with  sufficient  conditions  on  rule  instances,  then  sufficient 
conditions  on  rules,  and  now  sufficient  conditions  on  rules  for  replication  schemes, 
the  conditions  are  becoming  increasingly  restrictive  and  unnecessary  (unnecessary 
in  the  sense  of  “necessary  and  sufficient  conditions”),  ffowever,  as  imprecision  of 
the  conditions  increases,  it  appears  that  their  simplicity  and  utility  also  increases, 
as  is  demonstrated  in  the  following  section. 

4.2  Reductions  to  Satisfiability 

By  inspection,  the  conditions  of  the  previous  four  corollaries  are  clearly  satisfi¬ 
ability  formulas,  which  implies  that  checking  the  conditions  can  be  done  by  reduction 
to  satisfiability.  Specifically,  as  shown  in  section  4.2.1,  they  can  be  checked  by  reduc¬ 
tion  to  2SAT.  2SAT  is  a  particularly  desirable  version  of  the  SAT  problem  because 
it  can  be  solved  (efficiently)  in  polynomial  time. 

While  merely  checking  the  conditions  for  parallel  inference  is  useful,  many  non¬ 
trivial  rulesets  will  likely  require  all  data  to  be  replicated.  Thus,  it  would  be  useful  to 
determine  which  rules  (or  restricted  versions  thereof)  can  be  eliminated  to  increase 
parallelism.  In  section  4.2.2,  it  is  shown  that  augmenting  the  2SAT  reduction  to 
a  3SAT  reduction  allows  for  the  possibility  to  consider  elimination  of  rules.  As  a 
result,  though,  the  search  space  for  3SAT  solutions  can  become  insurmountable,  and 
so  in  section  4.2.3,  a  methodology  is  proposed  for  reducing  the  search  space  so  that 
a  restricted  version  of  a  ruleset  for  parallel  inference  can  quickly  be  converged  upon. 
This  methodology  is  then  used  in  section  4.3  to  restrict  the  RDFS  and  OWL2RL 
rulesets  into  rulesets  that  are  amenable  to  parallel  inference. 

4.2.1  Checking  Conditions  by  Reduction  to  2SAT 

In  this  section,  it  is  shown  how  checking  the  sufficient  conditions  for  replication 
schemes  can  be  reduced  to  2SAT.  First,  the  problem  must  be  clearly  defined  in  terms 
of  input  and  output. 
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Problem  1  (RConds).  Given  a  ruleset  R  and  a  pattern  replicator  (e,  a),  determine 
whether  the  conditions  of  corollaries  25,  26,  27,  and  28  are  satisfied. 

Lemma  29  states  that  not  only  are  the  conditions  of  the  previous  four  corol¬ 
laries  SAT  formulas,  but  they  can  also  be  equivalently  written  as  2SAT  formulas. 

Lemma  29.  RConds  is  reducible  to  2SAT. 

Proof.  Consider  the  condition  of  corollary  25,  that  VqspJ p)  f\pevf(r)\{Q}  a(p)- 
Suppose  that  for  some  Q  e  pc(r)i  Then  the  previous  formula  is  true 

iff  Apep/ (r)\{Q}  a{p)-  In  other  words,  the  formula  can  be  equivalently  formu¬ 
lated  as  A Qerf(r)i~^a(Q)  ApepJ(  r)\{gj  a(P)],  which  can  be  further  reformu¬ 

lated  as  A oePfrrMQ)  v  Apep+(  r)\{gi  a(P)],  and  then  performing  distribution, 

A Qev+(r)  / \peV+(r)\{Q}ia(Q )  v  a(P)}-  This  is  a  2SAT  formula. 

Consider  the  condition  of  corollary  26.  Apep^p)  a(p)  can  t>e  equivalently 
reformulated  ApeP^P)  a(p)  V  a(P)  which  is  a  2SAT  formula. 

Consider  the  condition  of  corollary  27.  Apep+p)  a(P)  can  t>e  equivalently 
reformulated  Apep/p)  a(P)  V  a/P)  which  is  a  2SAT  formula. 

Consider  the  condition  of  corollary  28.  [Vqsp^P)  ~^e(Q)]  i/\pev+(r)a(P)\ 

can  be  equivalently  reformulated  as  [A  q&p+(r)e(Q)}  V  [Apep+p)  a{p)]i  which  is 
equivalent  to  Aqgp/P)  A Pep+p)  e(Q)  V  ot(P),  which  is  a  2SAT  formula. 

Taking  the  conjunction  of  all  these  2SAT  formulas  forms  a  larger  2SAT  formula 
such  that  the  formula  is  satisfiable  iff  the  conditions  of  corollaries  25,  26,  27,  and  28 
are  met.  □ 

RConds  can  be  solved  by  reduction  to  2SAT,  but  consider  the  inputs  of 
RConds.  One  of  them  is  a  pattern  replicator  (e,  a)  which  means  that  for  any  pattern 
P,  it  must  be  determined  if  e(P)  and/or  a(P).  This  is  hardly  practical.  Consider, 
though,  the  possibility  of  specifying  a  finite  set  of  patterns  P  for  which  e(P)  and 
a(P)  is  defined  for  any  P  e  P.  Clearly,  P  should  include  all  the  patterns  occurring 
in  the  ruleset  under  consideration,  that  is,  P  D  [jreRV{r). 

Suppose  there  exists  Pi,  P2  G  P  such  that  T(Pi)nT(P2)  ^  0.  Then  by  definition 
37,  Pi  and  P2  are  related  by  any  P3  such  that  T(P3)  =  T(Pi)  fl  T(P2)  because 
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r(P3)  ^  r(Pi)  and  T(P3)  C  r(P2).  Therefore,  P3  should  be  included  in  P  and  e(P3) 
and  a(P3)  should  be  defined. 

This  leaves  open  a  number  of  questions,  though.  Even  though  it  is  clear  what 
P  should  be,  how  is  it  actually  derived  (can  it  even  be  derived)?  and  how  is  it 
accounted  for  in  the  2SAT  reduction? 

Starting  with  the  problem  of  deriving  P,  it  must  be  defined  how  to  test  whether 
T(Pi)  ft  T(P2)  7^  0,  and  if  so,  how  to  derive  P3  such  that  T(P3)  =  T( P1 )  D  T(P2).  It 
is  possible  that  there  could  already  be  a  P4  6  P  such  that  T(P4)  =  T(P3),  and  so  we 
need  a  way  to  check  whether  T(P4)  =  T(P3). 

These  problems  are  solved  in  the  following  lemma  and  proposition.  Lemma 
30  shows  how  to  test  whether  T(Pi)  fl  T(P2)  ^  0,  and  if  it  holds,  how  to  determine 
P3  such  that  T(P3)  =  T(Pi)  fl  T(P2).  Proposition  31  shows  how  to  test  whether 
T(P4)  =  r(P3).  Using  these  approaches,  P  can  be  derived  as  follows.  Initialize 
P  =  [jreRV(r).  Then  for  every  Pi,  P2  G  P  such  that  T(Pi)  fl T(P2)  ^  0,  determine  a 
pattern  P3  such  that  T(P3)  =  T(Pi)  fl  T(P2).  Check  all  the  P4  G  P  to  make  sure  that 
T(P3)  7^  T(P4),  and  if  so,  add  P3  to  P.  Do  this  iteratively  until  no  more  changes 
can  be  made  to  P.  Then  P  is  the  set  of  patterns  such  that,  for  all  P  G  P,  e(P)  and 
a(P)  need  to  be  defined. 

Lemma  30.  Let  Pi  =  And(fi  x\  . . .  xn)  and  P2  =  And(f2  y\  ■  ■  ■  ym)  be  patterns. 
If  no  most  general  unifier11  cr  exists  for  fi  and  f2,  then  T(Pi)nr(P2)  =  0.  If  no  most 
general  unifier  a  exists  for  fi  and  f2  such  that  for  1  <  i  <  n,  cr(xi)  is  matchable, 
and  for  1  <  i  <  m,  cr(yi)  is  matchable,  then  T(Pi)  fl  T(P2)  =  0.  Otherwise,  let  P3  = 
And(a(fi)  zi  . . .  Zk)  where  {zi}^=l  is  the  maximum  subset  of  {a(xi)}rjl'=10{o'(yyi)}”L1 
such  that  every  z^  is  a  restriction.  Then  T(Pi)  D  T(P2)  =  T(P3). 

Proof.  This  proof  assumes  familiarity  with  unification  and  unifiers.  For  an  overview, 
refer  to  [57]. 

First  proving  that  T(Pi)  n  T(P2)  C  T(P3).  /  G  T(Pi)  n  T(P2)  iff  /  G  T(Pi) 
and  /  G  T(P2),  which  is  true  iff  there  exists  ground  substitutions  o\  and  a2  such 
that:  =  /;  a 2(/2)  =  /;  for  1  <  i  <  n,  cri(xi)  is  matchable;  and  for  1  <  j  <  m, 

1:LNote  that  determining  a  most  general  unifier  is  a  well-studied  and  efficiently-solvable  problem 
in  computer  science.  [57] 
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cr2{jjj)  is  matchable.  Then  a\  U  <j2  is  a  unifier  of  f\  and  f2  since  [cri  U  cr2](/i)  =  /  = 
[ay  Ucr2](/2).12  Since  a  is  a  most  general  unifier  of  /i  and  f2,  then  it  holds  that  there 
is  a  substitution  o'  such  that  o'  o  a  =  [ay  U  o2\ .  So  then  there  exists  a  substitution 
o'  such  that  oJ(a(fi))  =  /.  ft  also  holds  that  a'izf)  is  matchable  for  1  <  i  <  k. 
Therefore,  /  G  T(P3). 

Now  proving  that  T(Pi)  D  T(P2)  D  T(P3).  /  G  T(P3)  means  that  there  exists  a 
substitution  cr'  such  that  a'(a(fi))  =  /,  which  means  that  a'oa  is  a  substitution  such 
that  o' oo(fi)  =  /.  The  same  can  be  said  for  f2  since  a  is  a  unifier  of  f\  and  f2  (that 
is,  because  o(fi)  =  o(f2)).  ft  also  holds  that  o'  o  o(xi)  is  matchable  for  1  <  i  <  n 
and  that  o'  o  o(yj)  is  matchable  for  1  <  j  <  m.  Therefore,  /  G  T(Pi)  D  T(P2).  □ 

Proposition  31.  Let  P\  =  And(fi  x\  ...  xn)  and  P2  =  And(f2  iji  ...  yrn)  be 
patterns.  Remove  any  duplicate  restrictions  from  P±;  do  the  same  for  P2.  Rename 
the  variables  in  Pi  such  that  if  the  ith  term  (directly)  in  f\  is  a  variable,  rename  it 
?xi . 1 3  Do  the  same  for  P2.  Then  using  some  total  ordering  <  on  restrictions,  order 
the  restrictions  in  Pi  and  P2  accordingly.  Then,  T(Pi)  =  T(P2)  iff  Pi  =  P2. 

Now  to  address  the  second  question,  which  is,  how  does  P  translate  into  the 
2SAT  reduction?  P  is  the  set  of  patterns  over  which  a  and  e  must  be  defined,  a 
and  e,  supposing  to  make  up  a  valid  pattern  replicator,  must  satisfy  all  the  inherent 
conditions  of  the  definition  of  pattern  replicator.  Granted,  the  bulleted  conditions 
of  definition  61  are  tautological,  stating  that  for  any  pattern  P  is  must  hold  that 
a(P)  V  -ia(P)  and  e(P)  V  _|e(P).  Less  trivial,  though,  is  meeting  the  conditions  of 
being  a  pattern  assignment.  These  are  outlined  in  lemma  32. 

Lemma  32.  A  pair  of  functions  (e,  a)  mapping  patterns  to  boolean  values  constitutes 
a  valid  pattern  replicator  if  the  following  hold  for  any  pattern  P: 

•  a(P)  — >•  ->e(P); 

•  for  any  pattern  P'  such  that  T(P')  C  T(P), 

12Here  I  have  assumed  that  Pi  and  P2  have  no  variables  with  the  same  name.  If  that  is  not  the 
case,  then  it  can  be  enforced  by  simply  renaming  variables  in  Pi  and  P2,  which  will  not  change 
the  values  of  T(Pi)  or  F(P2). 

1  ’Do  this  simultaneously  for  each  variable. 
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-  a(P)  -A  a(P'); 

-  e(P)^e(P'). 

Proof.  Let  (e,  a)  correspond  to  (&jv,  Qtf,  ©a/-)  as  defined  by  established  notation. 

Considering  a(P)  — >  ->e(P),  there  are  three  distinct  cases.  First,  ct(P)  A 
-ne(P)  =  [$aKP)  =  a/”]  A  [fi^(P)  =  A/].  Second,  -ia(P)  A  -e(P)  =  [^(P)  = 
0]  A  [nM(P)  =  AT].  Third,  -.a(P)  A  e(P)  =  [^(P)  =  0]  A  [flM(P)  =  0].  In  all  cases, 

%(P)cn#)c0#)=^. 

Considering  a(P)  — >  a(P/),  there  are  three  distinct  cases.  First,  a(P)  A 
a(P')  =  =  Af]  A  [$aKP')  =  A/].  Second,  -.a(P)  A  a(P')  =  [^(P)  = 

0]  A  [$at(P')  =  Af].  Third,  -ia(P)  A  -.a(P')  =  [^(P)  =  0]  A  [^(P')  =  0],  In  all 
cases,  <Fat(P)  C  $yv(P')- 

Considering  e(P)  — *  e(P'),  there  are  three  distinct  cases.  First,  e(P)  Ae(P')  = 
[^(P)  =  0]  A  [^(P')  =  0].  Second,  -.e(P)  Ae(P')  =  [^(P)  =  Af]  A  [LW(P')  =  0]. 
Third,  --e(P)  A  -e(P')  =  [fi^(P)  =  A/]  A  [fV(P')  =  A/].  In  all  cases,  Qj^(P')  C 

fi^(P). 

Therefore,  by  definition  58,  (e,  a)  is  a  valid  pattern  assignment  and  a  valid 
pattern  replicator.  □ 

Problem  2  (PConds).  For  a  finite  set  of  patterns  P  and  functions  e  and  a  mapping 
patterns  in  P  to  boolean  values,  determine  whether  the  conditions  of  lemma  32  are 
met. 

Proposition  33.  PConds  is  reducible  to  2SAT. 

Proof.  By  inspection  of  the  conditions  of  lemma  32.  □ 

Finally,  the  main  theorem  regarding  reduction  to  2SAT  can  be  formulated. 
It  essentially  states  that  since  the  sufficient  conditions  for  rules  with  replication 
schemes  and  for  validity  of  replication  schemes  are  all  2SAT  formulas,  then  any 
assignment  of  variables  satisfying  the  2SAT  formulas  correspond  to  a  replication 
scheme  for  which  parallel  inference  (for  a  given  polarized  ruleset,  using  a  RAOC)  is 
correct. 


Theorem  34.  Given 
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•  a  polarized  ruleset  R; 

•  a  finite  set  of  patterns  P  such  that  P  D  UreRV(r)  and  for  any  Pi,  P2  G  P  such 
that  TiPfi  D  r(P2)  7^  0,  there  exists  P3  G  P  such  that  r(P3)  =  r(Px)  n  r(P2); 

letting 

•  if  be  the  2SAT  formula  derived  from.  R  as  described  in  lemma  29; 

•  7  be  the  2SAT  formula  derived  from  P  as  described  in  proposition  33; 

then  any  assignment  of  variables  in  the  2SAT  formula  if  A  7  implies  a  pattern  repli¬ 
cator  (e,  a)  such  that  any  program  II  =  (I,'Hfix,  S,  R)  is  cyclically  IZ-parallel  where 

•  77  is  any  replication  scheme  conforming  to  (e,a); 

•  X  is  any  information  keeper; 

•  S  is  a  RAOC. 

Proof.  By  proposition  33,  a  solution  to  7  implies  two  functions  e'  and  a!  mapping 
patterns  in  P  to  boolean  values  such  that  the  conditions  of  lemma  32  are  met.  Then 
there  exists  a  pattern  replicator  (e,  a)  such  that  for  all  P  G  P,  e'(P)  =  e(P)  and 
a'(P)  =  a(P).  Let  77  be  any  replication  scheme  that  conforms  to  ( e,a ).  if  is 
satisfied,  which  by  lemma  29  means  that  the  conditions  of  corollaries  25,  26,  27,  and 
28  for  R  are  satisfied.  Then  by  corollary  22,  II  is  cyclically  77-par allel.  □ 

Corollary  35.  In  addition  to  the  conditions  of  theorem  3f,  if  II  terminates,  then 
II  is  weakly  IZ-parallel. 

Proof.  Follows  quickly  from  theorem  34  and  corollary  11.  □ 

Having  not  only  determined  conditions  under  which  parallel  inference  is  correct 
(for  a  polarized  ruleset  with  a  RAOC),  but  having  also  devised  a  way  to  test  those 
conditions,  it  would  be  instructive  to  test  a  common  ruleset  used  to  perform  inference 
over  RDF  data  crawled  from  the  Semantic  Web. 

Table  4.1  contains  the  CoreRDFS  ruleset,  the  RDFS  rules  that  are  most  widely 
valued  and  supported.  Reducing  the  ruleset  to  a  2SAT  formula  as  described  and  then 
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using  a  SAT  solver14  to  enumerate  the  solutions,  I  found  there  is  only  one  solution, 
which  corresponds  to  replicating  all  facts.  It  is  trivially  true  that  replicating  all 
facts  is  a  solution  for  correct  parallel  inference  with  any  ruleset,  but  it  defeats  the 
purpose  of  parallelization,  which  is  to  improve  performance  or  achieve  something 
that  is  not  feasible  on  a  single  machine. 

Therefore,  this  solution  is  unwanted,  and  it  can  be  avoided  by  adding  another 
clause  to  the  2SAT  formula.  Letting  P  =  And(?s  [?p->?o] )  be  the  pattern  rep¬ 
resenting  all  frame  facts  (or  triples,  in  the  case  of  RDF),  simply  add  the  following 
clause  to  the  2SAT  formula:  -i a(P)  V  -i a(P).  Adding  this  clause  states  that  a 
replication  scheme  is  not  allowed  to  replicate  all  the  frame  facts  (or  triples). 

Adding  that  clause  to  the  2SAT  formula  derived  from  the  CoreRDFS  ruleset, 
it  is  found  that  there  is  no  solution.  The  question  then  is,  if  correct  parallel  infer¬ 
ence  cannot  be  achieved  for  the  CoreRDFS  ruleset,  then  for  what  portion  of  the 
CoreRDFS  ruleset  (if  any)  can  correct  parallel  inference  be  achieved?  That  is  the 
topic  of  the  following  section. 

Before  continuing  on,  though,  the  RDFS  ruleset  given  in  table  B.l  was  similarly 
tested  and  no  solutions  were  found.  The  same  holds  for  the  OWL2RL  ruleset  given 
in  table  B.6.  These  results  should  not  be  surprising  given  recent  literature.  Hendler 
and  I  [21]  explicitly  disallowed  troublesome  data  that  would  cause  parallel  inference 
to  be  incorrect.  Recently,  Patel-Schneider  gave  a  more  in-depth  analysis  as  to  why 
(non-trivial,  embarrassingly)  parallel  inference  with  the  RDFS  ruleset  is  incorrect 

[45]- 

4.2.2  Eliminating  Rules  by  Reduction  to  3SAT 

The  reduction  to  2SAT  from  the  previous  section  is  useful  for  verifying  whether 
a  given  polarized  ruleset  and  replication  scheme  can  result  in  correct  parallel  infer¬ 
ence  (when  using  a  RAOC).  However,  in  many  cases,  correct  parallel  inference  is 
not  achievable,  and  it  is  not  readily  apparent  how  to  change  the  rules  to  achieve 
correct  parallel  inference. 

Perhaps  it  can  be  determined  which  cases  cause  (non-trivial)  parallel  inference 
14 Specifically,  I  used  relsat  [58]  version  2.2  [59]. 
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Table  4.1:  The  CoreRDFS  Ruleset 


Rule  ID 

If  And(.  .  .) 

Then  Do  (Assert  (...)  ) 

scm-spo 

?pl [rdf s : subProperty0f->?p2] 
?p2 [rdf s : subProperty0f->?p3] 

?pl  [rdf s : subProperty0f->?p3] 

scm-sco 

?cl [rdf s : subClass0f->?c2] 

?c2 [rdf s : subClass0f->?c3] 

?cl  [rdf s : subClass0f->?c3] 

prp-spol 

?pl [rdf s : subProperty0f->?p2] 

?x [?pl->?y] 

?x[?p2->?y] 

prp-dom 

?p [rdf s : domain->?c] 

?x[?p->?y] 

?x [rdf : type->?c] 

prp-rng 

?p [rdf s : range->?c] 

?x[?p->?y] 

?y [rdf : type->?c] 

cax-sco 

?cl  [rdf s : subClass0f->?c2] 

?x [rdf : type->?cl] 

?x [rdf : type->?c2] 

to  be  incorrect  and,  if  appropriate15,  eliminate  those  cases.  Finding  cases  that  cause 
parallel  inference  to  be  incorrect  can  be  done  by  modifying  the  reduction  to  2SAT 
into  a  reduction  to  3SAT. 

The  intuition  behind  the  idea  is  rather  straightforward.  For  every  clause  gen¬ 
erated  from  a  rule  r  in  the  reduction  in  the  proof  of  lemma  29,  simply  add  another 
literal  to  each  clause,  denoted  x(r).  If  y(r)  is  assigned  a  value  of  one  in  the  solution, 
then  it  means  that  rule  r  has  been  eliminated.  An  example  will  help  illustrate  this 
idea. 

Consider  rule  prp-spol  from  table  4.1.  From  the  reduction  in  the  proof  of 
lemma  29,  the  following  formula  would  be  generated,  where  P\  =  And(?s  [?pl->?o] ) , 
P2  =  And(s [?p2->?o] ) ,  and  Psp  =  And(?pl [rdf s : subProperty->?p2] ) . 

[a(Psp)  V  a(-Pi)]  A  [e(P2)  V  a(-Pi)]  A  [e(P2)  V  a(Psp )] 

This  formula  represents  the  part  of  the  2SAT  formula  derived  from  rule  prp-spol 
such  that,  when  satisfied  with  the  rest  of  the  formula,  correct  parallel  inference  can 
be  achieved  with  prp-spol.  Suppose,  though,  that  the  conditions  cannot  be  met. 
Then  we  will  want  to  consider  the  possibility  of  eliminating  prp-spol.  Letting  r  be 
rule  prp-spol,  we  can  change  the  formula  to  allow  for  that  possibility. 

x(r )  V  [[a(Psp)  V  a(Pi)]  A  [e(P2)  V  a(Pi)]  A  [e(P2)  V  a(Psp)]] 

15To  some  use  case.  No  use  case  is  assumed  herein,  but  those  who  utilize  this  approach  will 
likely  not  be  willing  to  give  up  all  cases  preventing  correct  parallel  inference. 
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Then  by  distribution,  the  following  is  equivalent. 

[x(r)  V  a(Psp )  V  a(Pi)]  A  [y(r)  V  e(P2)  V  a(P i)]  A  [y(r)  V  e(P2)  V  a(Psp)] 
Clearly,  this  is  a  3SAT  formula. 

Using  this  reduction  to  3SAT  instead  of  the  reduction  to  2SAT  from  lemma 
29,  the  overall  reduction  to  2SAT  in  theorem  34  becomes  a  reduction  to  3SAT.  Now 
a  solution  to  such  a  3SAT  formula  corresponds  to  a  replication  scheme  and  a  set  of 
rules  to  be  eliminated,  such  that  parallel  inference  will  be  correct  (with  a  RAOC). 

While  this  seems  like  a  good  idea  in  theory,  in  practice,  the  3SAT  formula  will 
often  have  a  large  number  of  solutions.  Enumerating  all  the  solutions,  or  choosing 
an  optimal  solution,  becomes  practically  intractable  for  non-trivial  rulesets.  This 
issue  is  addressed  in  the  following  section. 

4.2.3  Methodology  for  Reducing  the  Search  Space 

In  this  section,  a  methodology  is  presented  for  reducing  the  search  space  for 
satisfactions  of  3SAT  formulas  derived  by  rulesets  and  pattern  replicators  as  demon¬ 
strated  in  the  previous  two  sections.  This  methodology  is  imprecise  by  nature, 
making  use  of  intuition  and  heuristics.  Therefore,  there  is  no  guarantee  that  the 
methodology  will  provide  some  sort  of  optimal  solution.  Regardless,  its  practical 
value  will  be  demonstrated  by  using  it  to  restrict  common  rulesets  for  correct,  par¬ 
allel  inference. 

First,  a  notion  of  expected  factset  is  needed.  In  the  following,  let  an  expected 
factset  be  a  factset  that  is  likely  to  require  inference  for  some  given  scenario.  At 
this  point,  1  do  not  assume  any  particular  scenario,  but  this  notion  of  expected 
factset  is  helpful  in  capturing  any  intuition  one  might  have  about  the  factsets  under 
consideration. 

Definition  62.  A  pattern  P  is  said  to  have  high  selectivity  iff  for  any  expected 
factset  F,  |T(P)  D  P|/|F|  is  a  (subjectively)  small  fraction. 

In  other  words,  a  pattern  is  said  to  be  highly  selective  iff  it  is  expected  that 
that  pattern  will  match  relatively  few  facts.  In  this  way,  a  distinction  can  be  made 


64 


between  patterns  for  which  replication  of  matched  facts  comes  with  small  cost,  and 
other  patterns.  Defining  a  few  more  characteristics  of  patterns  will  further  help  in 
describing  the  methodology  steps. 

Definition  63.  A  pattern  P  is  said  to  be  computable  iff  T(P)  contains  only  inde¬ 
pendent  facts. 

The  notion  of  a  computable  pattern  is  that  all  the  facts  matched  by  the  pattern 
can  be  determined  dynamically  without  the  need  for  explicit  storage  of  facts.  Thus, 
replication  of  facts  matched  by  such  patterns  does  not  exactly  result  in  any  physical 
replication  of  facts. 

Definition  64.  A  pattern  P1  is  said  to  be  related  to  a  pattern  P2  iff  T(Pi)  nr(P2)  7^ 

0. 


Definition  65.  A  pattern  Pi  is  said  to  be  a  special  case  of  a  pattern  P2  iff  T(Pi)  C 

r  (p2). 

Patterns  are  related  if  they  both  match  some  same  fact.  This  means  that 
making  decisions  about  replication  or  arbitrary  placement  of  facts  matched  by  one 
pattern  can  have  an  impact  on  how  the  other  pattern  is  classified.  Special  cases 
result  in  even  stronger  implications  between  patterns.  These  specific  implications 
between  patterns  have  already  been  proven  in  section  4.1.  This  terminology  is  just 
introduced  here  to  simplify  the  description  of  the  methodology  steps. 

Assume  some  (polarized)  ruleset  R  under  consideration.  Step  1  is  to  assume 
that  facts  matched  by  selective  patterns  will  be  replicated.  The  idea  is  that,  by 
definition,  this  will  be  a  small  fraction  of  the  data,  so  go  ahead  and  be  liberal  with 
replication  of  these  facts. 

Step  1.  For  every  highly  selective  or  computable  pattern  P  occurring  in  some  rule 
r  G  R ,  replicate  the  facts  in  T(P)  by  adding  the  clause  ct(P)  to  the  SAT  formula. 
Let  M  denote  this  set  of  patterns. 

Step  2  is  to  restrict  or  “split”  rules  into  an  equivalent  set  of  rules  such  that 
each  new  rule  falls  into  one  of  two  categories:  a  rule  that  infers  facts  matched  by  a 
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pattern  in  M,  or  a  rule  that  infers  facts  not  matched  by  a  pattern  in  M.  In  essence, 
we  are  dividing  rules  into  disjoint  cases.  The  idea  of  “splitting”  here  is  vague,  but 
it  will  be  illustrated  in  an  example  later  in  this  section. 

Step  2.  For  each  rule  r  G  R,  “split”  r  into  multiple  rules  such  that  for  each  new  rule 
r',  all  the  patterns  in  Va (A)  are  either  special  cases  of  patterns  in  M  or  unrelated 
to  patterns  in  M.  Denote  the  new  ruleset  R' . 

Step  3  is  perhaps  the  most  arbitrary  step,  although  it  can  be  quite  useful  for 
reducing  the  search  space.  It  simply  says  to  use  intuition  to  try  and  determine  some 
patterns  for  which  matched  facts  should  be  placed  arbitrarily  among  processors. 
This  step  can  be  skipped  if  desired,  but  at  the  risk  that  the  search  space  will  be  less 
constrained  (good  for  finding  a  more  optimal  solution,  bad  for  traetability). 

Step  3.  Applying  intuition,  choose  some  patterns  P  that  are  not  highly  selective 
and  allow  the  facts  in  T(P)  to  be  placed  arbitrarily  by  adding  the  clause  e(P)  to 
the  SAT  formula.  Let  A  denote  this  set  of  patterns. 

Step  4  simply  inspects  the  rules  to  check  which  ones  have  condition  formulas 
that  can  only  match  facts  that  are  matched  by  facts  in  M.  In  other  words,  these  are 
rules  for  which  instances  will  be  fired  by  all  processors.  Clearly,  such  rules  are  safe 
for  parallel  inference,  and  so  it  is  enforced  that  they  not  be  eliminated. 

Step  4.  For  each  rule  r  G  R',  if  every  pattern  in  Vc{r)  is  a  special  case  of  a  pattern 
in  M,  then  add  _,x(r)  to  the  SAT  formula. 

Step  5  is  a  complicated  step,  although  more  intuitive  than  it  may  appear. 
Simply,  if  a  rule  has  no  negation  or  retraction,  and  at  most  one  of  its  condition 
patterns  has  not  been  selected  for  replication  of  facts,  and  every  assertion  is  allowed 
to  be  arbitrarily  placed,  then  do  not  eliminate  the  rule.  This  follows  directly  from 
corollaries  25,  26,  27,  and  28. 

Step  5.  For  each  rule  r  G  R':  if  P^(r)  =  0  and  Vq(t)  =  0;  and  if  at  most  one 
pattern  in  V^{r)  is  not  a  special  case  of  a  pattern  in  M;  and  if  all  the  patterns  in 
V\{r)  are  special  cases  of  patterns  in  A;  then  add  ->x(r)  to  the  SAT  formula. 
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Steps  6  and  7  simply  perform  the  reduction  to  3SAT  described  in  section 
4.2.2.  More  interestingly,  though,  step  8  turns  the  3SAT  reduction  into  a  heuristic 
Min-Cost  3SAT  reduction.  In  the  Min-Cost  variations  of  SAT  problems,  variables 
in  the  SAT  formulas  are  associated  with  weights  (usually  non-negative  integers), 
and  the  goal  is  to  find  an  assignment  of  the  variables  such  that  the  formula  is 
satisfied  and  the  sum  of  the  weights  of  the  variables  assigned  a  value  of  one,  is 
minimized.  More  formally,  letting  w(v)  be  the  weight  of  a  variable  and  a{y)  be  the 
value  assigned  to  v  by  assignment  a,  the  goal  is  to  determine  an  a  such  that  the 
formula  is  satisfied  and  there  is  no  other  assignment  a'  such  that  the  formula  is 
satisfied  and  a'{v)  ■  w(v)  <  a(v)  ■  w(v). 

This  “reduction”  to  Min-Cost  3SAT  is  not  a  true  reduction  because  an  op¬ 
timal  solution  to  the  weighted  3SAT  formula  does  not  correspond  to  an  optimal 
replication  scheme  and  set  of  eliminated  rules.  This  is  because  the  variable  weights 
are  determined  by  some  imprecise  heuristics.  Such  heuristics  are  proposed  later  in 
this  section. 

Step  6.  Initializing  P  to  MUAU(Jrgi?/  V(r),  iteratively  add  to  P  any  pattern  P3  such 
that  there  exists  Pi,  P2  €  P  and  T(P3)  =  T(Pi)  fl  T(P2)  where  no  such  P3  already 
exists  in  P.  Do  so  iteratively  until  no  changes  can  be  made  to  P. 

Step  7.  Generate  3SAT  clauses  using  the  modified  reduction  of  theorem  34  as 
described  in  section  4.2.2,  with  R!  as  the  ruleset  and  P  as  the  set  of  patterns  repre¬ 
senting  the  (partial)  domain  for  pattern  replicators. 

Step  8.  Heuristically  assign  weights  to  the  variables  in  the  3SAT  formula,  and  then 
use  the  pattern  replicator  corresponding  to  an  optimal  solution  of  the  3SAT  formula. 

An  example  of  the  application  of  this  methodology  would  be  instructive.  Con¬ 
sider  the  CoreRDFS  ruleset  from  table  4.1.  Let  an  expected  factset  be  any  factset 
consisting  only  of  frames  (corresponding  to  RDF  triples)  and  the  usual  indepen¬ 
dent  facts,  and  make  the  assumption  that  the  portion  of  the  factset  constituting 
terminological  data  (or  TBox  in  description  logic  vernacular)  is  small  enough  to  be 
considered  highly  selective.  This  latter  assumption  is  very  common  in  the  literature 
and  has  proven  helpful  in  scaling  inference  to  large  datasets  [19,  20,  21,  30,  46,  48,  51]. 
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rdf :  type  triples  are  not  included  because  they  are  generally  not  considered  termi¬ 
nological  and  almost  certainly  constitute  a  relatively  large  portion  of  an  expected 
factset. 

Keep  in  mind,  none  of  these  steps  are  necessary  for  the  reduction  to  correctly 
test  the  sufficient  conditions  for  correct  parallel  inference.  These  steps  are  just  to 
help  reduce  the  search  space.  Prior  to  taking  any  steps,  if  the  reduction  to  3SAT 
is  performed  immediately  without  forcing  any  variable  assignments,  there  are  809 
possible  solutions.  Admittedly,  the  search  space  for  the  3SAT  formula  derived  from 
the  CoreRDFS  ruleset  is  not  insurmountably  large,  but  this  is  just  an  example  to 
illustrate  the  methodology. 

For  step  1,  select  all  the  patterns  in  the  CoreRDFS  ruleset  that  are  highly 
selective.  Since  it  has  been  assumed  that  any  pattern  selecting  only  terminological 
data  is  highly  selective,  the  choice  is  clear.  M  will  consist  of  the  following  patterns. 

And(?xl [rdf s : subProperty0f->?x3] ) 

And(?xl [rdf s : subClass0f->?x3] ) 

And(?xl [rdf s : domain->?x3] ) 

And(?xl [rdf s : range->?x3] ) 

After  step  1,  there  are  108  solutions. 

For  step  2,  notice  that  for  r  being  scm-spo  or  scm-sco,  it  naturally  holds  that 
the  patterns  in  Va{t)  are  special  cases  of  the  patterns  in  M.  For  r  being  prp-dom, 
prp-rng,  or  cax-sco,  it  naturally  holds  that  that  the  patterns  in  V a  (r)  are  unrelated 
to  the  patterns  in  M.  Therefore,  these  five  rules  need  not  be  split  (or  rather,  they 
are  split  into  themselves).  The  only  rule  in  need  of  modification  is  prp-spol.  It  is 
split  by  restricting  the  variables  in  the  action  block  so  that  each  rule  either  produces 
facts  that  must  be  replicated  or  facts  that  need  not  necessarily  be  replicated.  In  the 
former  case,  there  are  four  such  rules. 

If  And(  ?pl [rdf s : subProperty->rdf s : subPropertyOf] 

?x  [?pl— >?y]  ) 

Then  Do (Assert (  ?x [rdf s : subPropertyOf->?y]  )) 
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If  And(  ?pl [rdf s : subProperty->rdf s : subClassOf ] 

?x[?pl->?y]  ) 

Then  Do (Assert (  ?x [rdf s : subClassOf->?y]  )) 

If  And(  ?pl [rdf s : subProperty->rdf s : domain] 

?x[?pl->?y]  ) 

Then  Do (Assert (  ?x  [rdf s : domain->?y]  )) 

If  And(  ?pl [rdf s : subProperty->rdf s : range] 

?x[?pl->?y]  ) 

Then  Do (Assert (  ?x [rdf s : range->?y]  )) 

For  the  sake  of  brevity,  in  the  remainder  of  the  paper,  I  will  group  together  such 
rules  using  a  newly  introduced  IN  keyword  (it  could  be  considered  a  special  builtin 
predicate  for  this  very  purpose,  like  pred:  list-contains). 

If  And(  ?pl [rdf s : subProperty->?p2] 

?x[?pl->?y] 

?p2  IN  List (rdf s : subPropertyOf 
rdf s : subClassOf 
rdf s : domain 
rdfs: range)  ) 

Then  Do (Assert (  ?x[?p2->?y]  )) 

In  the  latter  case,  there  is  one  such  rule. 

If  And(  ?pl [rdf s : subProperty->?p2] 

?x[?pl->?y] 

Not(?p2  =  rdf s : subPropertyOf ) 

Not(?p2  =  rdf s : subClassOf ) 

Not(?p2  =  rdfs: domain) 

Not(?p2  =  rdfs: range)  ) 

Then  Do (Assert (  ?x[?p2->?y]  )) 
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For  brevity,  in  the  remainder  of  the  paper,  I  will  compress  the  restrictions  in  the 
condition  using  a  newly  introduced  NOTIN  keyword  (this  could  also  be  considered  a 
special  builtin  predicate  for  this  very  purpose). 

If  And(  ?pl [rdf s : subProperty->?p2] 

?x[?pl->?y] 

?p2  NOTIN  List (rdf s : subPropertyOf 
rdf s : subClassOf 
rdf s : domain 
rdf s : range) 

Then  Do (Assert (  ?x[?p2->?y]  )) 

After  step  2,  using  the  new  ruleset  R',  there  are  360  solutions.  Note  that  this  step 
actually  increased  the  search  space.  This  is  caused  by  splitting  the  rules,  which 
increases  the  number  of  rules  and  patterns,  which  increases  the  number  of  clauses 
and  variables  in  the  SAT  formula,  which  can  increase  the  number  of  solutions.  Step 
2  is  really  meant  to  try  and  preserve  some  of  the  semantics  of  the  original  ruleset 
by  splitting  the  rules  into  cases  based  on  whether  or  not  they  can  infer  (necessarily) 
replicated  facts.  Those  that  do  must  meet  more  conditions. 

For  step  3,  it  seems  like  a  good  idea  that  triples  matching  the  following  pattern 
should  be  placed  arbitrarily. 

And(?xl[?x2->?x3] 

7x2  NOTIN  List (rdf s : subPropertyOf 
rdf s : subClassOf 
rdf s : domain 
rdfs: range)  ) 

The  intuition  applied  here  (be  it  helpful  or  not)  is  that  anything  that  is  not  consid¬ 
ered  terminological  data  should  be  placed  arbitrarily.  After  this  step,  there  are  128 
solutions. 

For  step  4,  rules  scm-spo  and  scm-sco  are  kept  from  being  eliminated  because 
the  patterns  in  their  conditions  are  all  special  cases  of  replication  patterns.  For  step 
5,  rules  prp-dom,  prp-rng,  cax-sco,  and  the  restricted  version  of  prp-spo  in  which 
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inferred  facts  are  not  necessarily  replicated,  are  all  kept  from  elimination.  Then, 
only  one  rule  is  at  risk  for  elimination,  and  that  is  the  following. 

And(?xl[?x2->?x3] 

?x2  IN  List (rdf s : subPropertyOf 
rdf s : subClassOf 
rdf s : domain 
rdfs: range)  ) 

After  steps  4  and  5,  there  are  only  two  possible  solutions. 

Steps  6  and  7  I  have  done  programmatically,  and  in  this  way  have  been  able  to 
report  the  number  of  solutions  as  I  progress  through  the  steps.  For  the  CoreRDFS 
ruleset,  P  contains  the  following  patterns. 

And(?xl [rdf s : subProperty0f->?x3] ) 

And(?xl [rdf s : subClass0f->?x3] ) 

And(?xl [rdf s : domain->?x3] ) 

And(?xl [rdf s : range->?x3] ) 

And(?xl [rdf : type->?x3] ) 

And(?xl[?x2->?x3] 

?x2  NOTIN  List (rdf s : subPropertyOf 
rdfs : subClassOf 
rdfs : domain 
rdfs: range)  ) 

And(?xl [?x2->?x3] ) 

Even  though  the  number  of  patterns  is  small,  the  SAT  formula  produced  by  step  7 
has  90  clauses  and  is  too  large  to  display  here  with  any  clarity. 

As  already  mentioned,  there  are  only  two  solutions  at  this  point.  Since  there 
are  so  few  solutions,  step  8  could  be  skipped  as  the  two  solutions  can  just  be  in¬ 
spected  and  the  preferred  one  chosen.  However,  for  the  purpose  of  example,  consider 
the  following  possible  heuristics. 

Heuristic  1.  For  all  P  e  P,  let  weight(e(P ))  =  0. 
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This  heuristic  says  that  arbitrarily  assigning  facts  to  processors  comes  for  free. 
It  is  a  good  thing  (for  parallelization)  that  data  be  kept  from  being  replicated  (if 
possible),  so  no  cost  should  be  associated  with  it. 

Heuristic  2.  For  all  P  e  P:  if  P  is  computable,  let  weight(a(P))  =  0;  otherwise, 
let  weight(a(P ))  =  1. 

Computable  patterns  match  only  facts  that  are  not  physically  manifest  but 
rather  computationally  determined  on  demand.  Thus,  “replication”  of  those  facts 
comes  for  free.  For  other  patterns,  though,  an  arbitrary  cost  is  associated  with  repli¬ 
cation  of  its  matched  facts.  This  simple  heuristic  is  not  very  precise.  Some  patterns 
represent  a  much  larger  number  of  facts  than  others,  yet  the  cost  for  replication  of 
facts  for  any  non-computable  pattern  is  the  same.  The  heuristic  is  naive,  but  in 
practice,  it  has  seemed  more  effective  than  initially  expected.  The  reason  appears 
to  be  that  these  patterns  are  often  related  through  common  subsets  as  a  result  of 
step  6. 

Heuristic  3.  For  any  r  G  R' ,  let  weight(x(r))  —  Spep  weight  (ct  (P)). 

This  heuristic  says  that  it  is  always  preferable  to  replicate  data  rather  than 
eliminate  rules,  except  when  all  data  must  be  replicated.  The  assumption  is  that  we 
want  to  preserve  the  originally  intended  semantics  of  the  rules  as  much  as  possible. 

Finally,  after  step  8,  the  selected,  heuristically  optimal  solution  corresponds 
to  a  pattern  assignment  and  a  set  of  rules  to  eliminate.  Only  one  rule  (or  rather 
four  rules  shown  here  syntactically  as  one  rule)  is  eliminated. 

If  And(  ?pl [rdf s : subProperty->?p2] 

?x[?pl->?y] 

?p2  IN  List (rdf s : subPropertyOf 
rdf s : subClassOf 
rdf s : domain 
rdfs: range)  ) 

Then  Do (Assert (  ?x[?p2->?y]  )) 

Furthermore,  facts  matching  the  following  patterns  should  be  replicated. 
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Table  4.2:  The  Par-CoreRDFS  Ruleset 


Rule  ID 

If  And(.  .  .) 

Then  Do  (Assert  (...)  ) 

scm-spo 

?pl  [rdf s : subProperty0f->?p2] 

?p2  [rdf s : subProperty0f->?p3] 

?pl [rdf s : subPropertyOf ->?p3] 

scm-sco 

?cl  [rdf s : subClass0f->?c2] 

?c2  [rdf s : subClass0f->?c3] 

?cl [rdf s : subClass0f->?c3] 

prp-spol* 

?pl  [rdf s : subProperty0f->?p2] 

?x  [?pl->?y] 

?p2  NOTIN  List (rdf s : subPropertyOf 
rdf s : subClassOf 

rdf s : domain 
rdf s : range) 

?x [?p2->?y] 

prp-dom 

?p  [rdf s : domain->?c] 

?x[?p->?y] 

?x  [rdf : type->?c] 

prp-rng 

?p  [rdf s : range->?c] 

?x[?p->?y] 

?y [rdf : type->?c] 

cax-sco 

?cl  [rdf s : subClass0f->?c2] 

?x  [rdf : type->?cl] 

?x  [rdf : type->?c2] 

And(?xl [rdf s : subProperty0f->?x3] ) 

And(?xl [rdf s : subClass0f->?x3] ) 

And(?xl [rdf s : domain->?x3] ) 

And(?xl [rdf s : range->?x3] ) 

In  this  case,  they  are  the  exact  patterns  that  were  selected  for  replication  in  step  1. 
Then,  parallel  inference  with  the  rules  in  table  4.2  is  correct  when  data  is  distributed 
(or  replicated)  as  just  stated. 

At  this  point,  it  is  important  to  draw  a  distinction  between  this  work  and 
related/previous  work.  In  related/previous  work,  restrictions  have  been  placed  on 
the  data  (factsets)  and  the  conclusion  has  been,  if  one  does  not  have  such  data  in 
his/her  dataset  (or  such  facts  in  his/her  factset),  then  parallel  inference  is  correct 
(sound  and  complete)  [19,  21].  More  concisely,  correctness  of  inference  was  condi¬ 
tioned  on  characteristics  of  the  data.  The  conditions  or  restrictions  have  been  either 
characterized  as  broad  propositions  [21]  or  mired  in  mathematical  formulas  [19]. 

In  this  work,  I  have  taken  a  different  perspective.  Instead  of  saying  inference  is 
correct  for  a  ruleset  R  when  a  factset  F  meets  certain  conditions,  I  am  determining 
that  parallel  inference  is  correct  for  a  specific  approximation  of  the  ruleset  R  (which 
sometimes  is  even  a  very  poor  approximation,  although  not  in  the  case  of  CoreRDFS 
and  Par-CoreRDFS)  regardless  of  features  of  the  factset.  More  concisely,  correctness 
of  inference  is  conditioned  on  characteristics  of  the  rules ,  and  the  conditions  are 
explicitly  given  by  determining  which  rules  should  be  eliminated. 
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In  some  way,  this  is  similar  to  the  perspective  proposed  by  Hitzler  and  van 
Harmelen  [60],  that  sound  and  complete  reasoning  should  be  considered  a  gold 
standard,  and  what  is  really  needed  is  a  determination  of  how  close  to  the  standard 
one  can  come  while  preserving  some  degree  of  scalability.  The  work  of  this  chapter 
represents  a  step  in  that  direction  by  proving  that  under  specific,  explicit  conditions, 
scalability  in  the  form  of  embarrassingly  parallel  inference  is  achievable.  The  only 
question  that  remains  is,  how  close  is  this  inference  to  the  gold  standard?  This 
remains  to  be  determined  in  future  work. 

4.3  Deriving  Rulesets  Amenable  to  Parallel  Inference 

In  this  section,  the  methodology  of  the  previous  section  is  used  to  restrict 
the  RDFS  and  OWL2RL  rulesets  into  versions  for  which  parallel  inference  will  be 
correct  for  some  non-trivial  replication  schemes. 

4.3.1  Restricting  RDFS 

The  RDFS  ruleset  is  given  in  table  B.l  in  appendix  B.16  Note  that  this  is  not 
the  complete  RDFS  ruleset  as  the  infinite  number  of  axiomatic  triples  related  to 
container  membership  properties  have  been  excluded  in  order  to  provide  decidable 
inference.17  Prior  to  taking  any  of  the  steps  in  the  proposed  methodology,  the  3SAT 
formula  has  1,030,268,192  solutions. 

For  step  1,  I  chose  the  following  “terminological  patterns”  and  computable 
patterns  for  replication. 

And(?xl [rdf s : domain->?x3] ) 

And(?xl [rdf s : range->?x3] ) 

And(?xl [rdf s : subClass0f->?x3] ) 

And(?xl [rdf s : subProperty0f->x3] ) 

And(?xl [rdf :type->rdfs: Class] ) 

And(?xl [rdf :type->rdfs : Datatype] ) 

And(?xl [rdf :type->rdfs:ContainerMembershipProperty] ) 

16 All  tables  prefixed  with  B  appear  in  appendix  B  due  to  length  of  the  tables. 

1'See  [53]  for  a  more  thorough  discussion. 
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And(External (pred : is-literal-XMLLiteral (?xl) ) ) 

And(External (pred : is-literal-PlainLiteral (?xl) ) ) 

Note  that  I  have  excluded  the  pattern  And(?xl  [rdf  :type->rdf  :  Property] )  even 
though  it  could  be  considered  terminological.  This  reflects  some  intuition  on  the 
matter.  Were  facts  matched  by  And (?xl  [rdf  :type->rdf  :  Property] )  to  be  re¬ 
quired  to  replicate,  then  by  corollary  28,  all  triples  would  have  to  replicated  given 
rule  rdfl,  or  the  rule  rdfl  would  have  to  be  eliminated.  Additionally,  even  though 
the  individual  axiomatic  triples  are  very  selective,  I  have  not  forced  their  replication 
here  because  they  will  be  replicated  regardless  by  virtue  of  the  fact  that  every  proces¬ 
sor  has  all  the  rules.  These  finer  insights  reflect  the  imprecision  of  the  methodology 
and  the  value  of  an  understanding  of  the  3SAT  reduction. 

After  step  1,  there  are  40,450,304  possible  solutions,  far  fewer  than  the  ini¬ 
tial  1,030,268,192,  but  still  a  large  number.  Remember,  though,  that  there  is  no 
assurance  that  I  have  not  precluded  an  optimal  solution  by  taking  these  steps,  but 
the  methodology  has  the  benefit  of  making  checking  the  sufficient  conditions  (via 
reduction  to  3SAT)  more  tractable  and  efficient. 

Following  step  2,  there  are  too  many  examples  to  be  listed  here,  but  for  clarity, 
rule  rdfs7  (same  as  prp-spol)  is  a  good  single  example  of  how  the  rules  are  split 
(again,  using  IN  and  NOTIN  to  syntactically  compress  multiple  rules  into  fewer  rules). 

If  And(  ?pl [rdf s : subProperty0f->?p2] 

?x[?pl->?y] 

?p2  NOTIN  List (rdf s : subPropertyOf 
rdf s : subClassOf 
rdf s : domain 
rdf s : range 
rdf: type)  ) 

Then  Do (Assert (  ?x[?p2->?y]  )) 

If  And(  ?pl [rdf s : subPropertyOf ->?p2] 

?x[?pl->?y] 

?p2  IN  List (rdf s : subPropertyOf 
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rdf s : subClassOf 
rdf s : domain 
rdfs: range)  ) 

Then  Do (Assert (  ?x[?p2->?y]  )) 

If  And(  ?pl  [rdf s : subPropertyOf->rdf : type] 

?x[?pl->?y] 

?y  NOTIN  List (rdf s : Class 

rdfs : Datatype 

rdfs : ContainerMembershipProperty)  ) 

Then  Do (Assert (  ?x [rdf :type->?y]  )) 

If  And(  ?pl [rdf s : subPropertyOf->rdf : type] 

?x[?pl->?y] 

?y  IN  List (rdf s : Class 

rdfs : Datatype 

rdfs : ContainerMembershipProperty)  ) 

Then  Do (Assert (  ?x [rdf :type->?y]  )) 

After  step  2,  there  are  1,529,405,440  possible  solutions,  even  more  than  at  the  be¬ 
ginning.  Again,  the  number  of  solutions  has  grown  due  to  the  increased  number  of 
patterns  and  rules.  Although  it  is  not  unusual  for  this  step  to  increase  the  number 
of  solutions,  it  is  an  important  step  to  take  so  that  whole  rules  are  not  eliminated 
simply  because  of  some  special  case.  In  other  words,  more  than  reducing  the  search 
space,  this  step  improves  preservation  of  the  semantics  of  the  original  rules. 

For  step  3,  I  chose  to  require  arbitrary  placement  of  the  facts  matching  the 
following  patterns. 

And(?xl[?x2->?x3] 

7x2  NOTIN  List (rdf s : subPropertyOf 
rdfs : subClassOf 
rdfs : domain 
rdfs : range 


76 


rdf: type)  ) 

And(?xl [rdf :type->?x3] 

?x3  NOTIN  List (rdf s : Class 

rdf s : Dataype 

rdf s : ContainerMerabershipProperty)  ) 

After  this  step,  there  are  655,360  solutions. 

Performing  steps  4  and  5  programatically,  there  remain  only  10  possible  solu¬ 
tions.  From  step  6,  P  contains  87  patterns.  Using  the  aforementioned  heuristics  for 
step  8,  the  final  solution  is  as  follows.  Facts  matched  by  the  patterns  chosen  in  step 
1  should  be  replicated,  and  others  can  be  placed  arbitrarily,  ffowever,  the  following 
rules  must  be  eliminated. 

If  And(  ?u [rdf :type->rdf : Property]  ) 

Then  Do (Assert (  ?u  [rdf s : subPropertyOf->?u]  )) 

If  And(  ?x[?p->?y] 

?p [rdf s : range->?c] 

?c  IN  List (rdf s : Class 

rdf s : Datatype 

rdf s : ContainerMerabershipProperty)  ) 

Then  Do (Assert (  ?y [rdf :type->?c]  )) 

If  And(  ?x[?p->?y] 

?p [rdf s : domain->?c] 

?c  IN  List (rdf s : Class 

rdf s : Datatype 

rdf s : ContainerMerabershipProperty)  ) 

Then  Do (Assert (  ?x [rdf :type->?c]  )) 

If  And(  ?x[?pl->?y] 

?pl [rdf s : subProperty0f->?p2] 
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?p2  IN  List (rdf s : subPropertyOf 
rdf s : subClassOf 
rdf s : domain 
rdfs: range)  ) 

Then  Do (Assert (  ?x[?p2->?y]  )) 

If  And(  ?x[?pl->?y] 

?pl [rdf  s : subPropertyOf ->rdf : type] 

?y  IN  List (rdf s : Class 

rdfs : Datatype 

rdfs : ContainerMembershipProperty)  ) 

Then  Do (Assert (  ?x [rdf :type->?y]  )) 

If  And(  ?x [rdf :type->?cl] 

?cl [rdfs : subClass0f->?c2] 

?c2  IN  List (rdf s : Class 

rdfs : Datatype 

rdf s : ContainerMembershipProperty)  ) 

Then  Do (Assert (  ?x [rdf :type->?c2]  )) 

Then  parallel  inference  is  correct  for  the  Par-RDFS  ruleset  given  in  table  B.5.  Note 
that  rule  rdfs6  was  eliminated  entirely.  In  [21],  Hendler  and  I  reasoned  that  rule 
rdfs6  was  suitable  for  parallel  inference  because  the  kinds  of  triples  it  infers  do  not  - 
in  the  context  of  the  finite  RDFS  closure  and  with  certain  axiomatic  assumptions  on 
the  data  -  lead  to  further  novel  inferences  that  would  need  to  be  replicated.  Those 
same  axiomatic  assumptions  have  not  been  assumed  on  the  data  in  this  work,  and 
so  that  argument  does  not  necessarily  (and  almost  certainly  does  not)  hold  here. 

4.3.2  Restricting  OWL2RL 

Following  the  methodology  for  the  OWL2RL  ruleset  given  in  table  B.6  is  much 
more  complicated  given  the  large  number  of  rules  and  patterns  involved,  and  so  in 
this  section,  just  the  highlights  are  reviewed.  Note  that  the  OWL2RL  rules  presented 
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herein  are  different  than  those  from  [3].  They  are  a  RIF  variation  of  the  OWL2RL 
rules  from  [61],  deviating  to  make  the  rules  amenable  to  forward-chaining,  following 
advice  from  [61]  as  well. 

First  notice  that  in  the  OWL2RL  ruleset,  many  of  the  rules  will  get  eliminated 
due  to  corollary  25.  One  such  example  is  rule  prp-fp. 

If  And(  ?p [rdf : type->owl : FunctionalProperty] 

?x[?p->?yl] 

?x[?p->?y2]  ) 

Then  Do (Assert (  ?yl [owl : sameAs->?y2]  )) 

The  reason  for  direct  elimination  of  this  rule  is  that  it  contains  two  patterns  which 
match  all  triples  (frames).  By  corollary  25,  the  triples  (facts)  matching  one  of  these 
patterns  must  be  replicated,  which  means  that  all  triples  (frame  facts)  must  be 
replicated.  This  defeats  the  purpose  of  parallelization,  and  so  no  choice  remains 
except  to  eliminate  rule  prp-fp. 

Before  any  steps  are  taken,  relsat  (the  SAT  solver  that  I  used  [59])  reports 
that  there  are  381,157,376  possible  solutions,  which  seems  too  few.  This  report  is 
accompanied  by  a  warning:  “WARNING:  Not  using  a  bignum  package.  Solution 
counts  may  overflow.”  Upon  proceeding  with  step  1,  forcing  replication  of  the  facts 
matched  by  the  patterns  in  table  B.8,  relsat  reports  that  there  are  zero  possible 
solutions,  although  it  enumerates  many  solutions.  This  seems  to  suggest  that  the 
number  of  solutions  is  so  large  that  whatever  integer  datatype  is  being  used,  it  is 
not  large  enough  to  represent  the  value.  After  step  2,  relsat  again  reports  zero  pos¬ 
sible  solutions,  although  it  provides  an  example  solution.  After  forcing  arbitrary 
placement  of  facts  matched  by  patterns  in  table  B.8,  relsat  again  reports  zero  pos¬ 
sible  solutions,  although  it  provides  an  example  solution.  After  steps  4  and  5,  relsat 
reports  180  possible  solutions.  From  step  6,  P  contains  534  patterns. 

The  remaining  steps  pick  a  single  solution.  In  short,  using  the  rules  in  table 
B.10,  replicating  facts  matching  patterns  in  B.9,  parallel  inference  is  correct  (when 
using  a  RAOC).  Unlike  with  restricting  the  CoreRDFS  and  RDFS  rulesets,  a  signif¬ 
icant  degree  of  the  semantics  represented  by  the  original  rules  has  been  sacrificed, 


79 


which  can  be  observed  by  comparing  the  rules  in  table  B.6  with  the  rules  in  table 
B.10. 

4.4  Summary 

In  summary,  this  chapter  adapted  the  general,  sufficient  conditions  derived  in 
chapter  3  to  apply  specifically  to  a  special  case  of  distribution  schemes  called  repli¬ 
cation  schemes.  Determining  whether  these  sufficient  conditions  are  met  was  shown 
to  be  reducible  to  2SAT.  The  reduction  to  2SAT  was  then  augmented  to  allow  for 
the  possibility  to  eliminate  rules  from  the  ruleset  in  order  to  improve  parallelization, 
and  the  new  reduction  is  to  3SAT.  The  3SAT  reduction,  though,  can  easily  result  in 
3SAT  formulas  for  which  there  are  many  possible  solutions,  and  thus,  can  quickly 
become  intractable.  A  methodology  was  then  proposed  for  how  to  reduce  the  search 
space  for  3SAT  assignments.  Although  the  methodology  does  not  necessarily  lead 
to  an  optimal  solution,  it  is  nonetheless  effective  in  restricting  rulesets  for  correct 
parallel  inference,  as  was  illustrated  by  restriction  of  the  CoreRDFS  ruleset,  the 
RDFS  ruleset,  and  the  OWL2RL  ruleset. 


CHAPTER  5 

EVALUATION  OF  PARALLEL  INFERENCE  ON 
SUPERCOMPUTERS 

To  empirically  test  the  theoretical  hirelings  of  chapters  3  and  4,  an  evaluation  is  per¬ 
formed  for  parallel  inference  with  rnlesets  that  have  been  restricted  to  be  amenable 
to  parallel  inference,  in  combination  with  large,  established,  RDF  datasets. 

Section  5.1  details  the  parameters  of  the  evaluation,  including  the  rulesets, 
datasets,  metrics,  and  supercomputers  used,  as  well  as  a  discussion  of  the  details 
of  the  (sequential)  inference  engine  used  by  each  processor.  The  bulk  of  the  actual 
evaluation  is  reported  in  section  5.2. 

5.1  Parameters  of  Evaluation 

A  thorough  description  of  the  parameters  of  the  evaluation  is  necessary  to 
interpret  the  results  reported  in  section  5.2.  This  section  provides  such  details. 
All  code  was  written  in  C++  (except  for  source  code  from  the  LZO  library,  which 
is  written  in  C)  using  the  Message  Passing  Interface  [62]  (MPI)  for  interprocess 
communication. 

5.1.1  Software  Implementation  of  Inference  Engine 

In  the  past  [21],  I  had  used  librdf  [63]  (aka  Redland)  to  implement  inference  on 

an  x86-based  cluster,  using  iterative  SPARQL  CONSTRUCT  queries  as  rules  [64], 

However,  after  many  attempts,  it  did  not  seem  feasible  to  install  and  use  librdf  on 

a  Blue  Gene18  due  to  dependencies  on  other  libraries.  This  is  a  common  problem  in 

using  specialized  supercomputers,  that  it  is  difficult  to  install  and  use  such  libraries 

on  atypical  architectures.  For  this  reason,  and  desiring  to  demonstrate  scaling  up 

to  large  numbers  of  processors  (like  the  thousands  provided  by  a  Blue  Gene),  I 

was  compelled  to  implement  an  inference  engine  on  my  own.  However,  since  a 

18 Specifically,  I  had  tried  this  on  a  Blue  Gene/L  in  2009  to  no  avail.  I  am  uncertain  as  to  whether 
success  could  be  had  on  later,  more  POSIX  standard  versions  of  the  Blue  Gene.  By  the  time  a 
Blue  Gene/Q  was  made  available  to  me,  I  had  already  written  the  code  used  in  this  evaluation. 
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sequential  inference  engine  is  not  the  main  purpose  of  the  work  presented  herein,  the 
implementation  is  naive.  This  should  be  kept  in  mind  when  interpreting  the  relative 
speedup  results  in  section  5.2  since  it  is  well-known  that  relative  speedup  (or  relative 
performance  metrics  in  general)  favors  less  efficient  algorithms  [65].  However,  the 
execution  times  are  reported  as  well  in  order  to  qualify  any  bias  that  may  be  present 
in  the  speedup  results. 

The  inference  engine  uses  a  forward-chaining,  materialization  strategy.  That 
is,  starting  with  the  data  (or  facts),  apply  the  rules,  and  add/remove  data  (or  infer¬ 
ences  in  the  case  of  adding  data)  into  the  body  of  facts  as  inference  progresses.  The 
choice  to  perform  forward-chaining  here  is  based  primarily  on  simplicity.  Forward¬ 
chaining  inference  engines  are  easier  to  implement  than,  say,  backward-chaining 
reasoners.  The  main  performance  drawback,  though,  is  that  forward-chaining  ma¬ 
terialization  is  space-intensive.  Because  all  inferences  must  be  stored  explicitly,  it 
is  quite  possible  to  exhaust  memory,  and  in  fact,  I  have  easily  done  so  with  some 
rulesets  and  datasets.  Regardless,  the  evaluation  herein  focuses  primarily  on  paral¬ 
lel  scalability,  i.e. ,  how  some  performance  characteristic  changes  while  varying  the 
number  of  processors.  Thus,  efficiency  of  the  sequential  inference  algorithm  is  sec¬ 
ondary,  and  important  only  inasmuch  as  each  processor  must  be  able  to  perform 
inference.  However,  this  decision  does  imply  some  restrictions  on  this  evaluation, 
which  are  discussed  further  in  this  section  and  in  section  5.1.4. 

The  inference  engine  simply  iterates  over  the  rules  in  the  order  that  they  are 
given  (see  section  B.3),  terminating  when  no  more  changes  are  made  to  the  dataset 
(hxpoint  semantics).  The  conflict  set  is  never  explicitly  formed,  and  doing  so  is 
unnecessary  under  the  semantics  of  a  AOC  because  all  rule  instances  will  be  bred 
anyway.  Therefore,  the  rules  are  bred  in  a  Datalog-like  fashion,  translating  the  rule 
conditions  into  relational  queries  and  then  using  the  results  to  bre  actions.  None  of 
the  rulesets  under  consideration  (to  be  discussed  in  section  5.1.4)  in  this  evaluation 
contain  negation  (except  of  safely-used  equality  formulas)  or  retraction,  so  order  of 
rule  bring  is  inconsequential,  and  a  bnite  closure  can  always  be  produced.19  The 

19Note  also  that  the  rulesets,  discussed  a  bit  later,  contain  neither  functions  nor  built-ins,  and 
equality  is  used  “safely.”  See  [54,  66]  for  more  details  on  the  safeness  of  rules,  which  guarantees  a 
finite  closure  for  rulesets  without  negation  or  retraction. 
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queries  of  the  condition  formulas  are  executed  from  left  to  right  in  the  order  shown 
in  the  rule  condition  (i.e.,  a  left-deep  query  execution  plan).  If  at  any  point  a  join 
or  selection  results  in  zero  results,  the  query  returns  immediately  with  no  results. 
Care  must  be  taken  to  make  sure  that  equality  and  built-ins  come  after  other  atomic 
formulas  that  bind  their  variables. 

Each  atomic  formula  transforms  into  a  select-project-rename  operation,  the 
exception  being  equality  formulas.  Equality  formulas  containing  ground  terms  (this 
includes  all  restrictions  as  defined  in  definition  55)  are  grouped  together  by  shared 
variables  and  executed  as  a  single  selection  operation.  The  persistent  and  growing 
dataset  (set  of  triples,  factset)  is  implemented  as  a  std:  :  set  with  predicate-object- 
subject  (POS)  ordering.  Should  a  triple  pattern  (frame  formula)  arise  for  which 
a  POS  index  is  unsuitable,  a  scan  is  performed  on  the  std:  :set.  Atoms  are  also 
stored  in  std:  :  sets,  each  predicate  having  its  own  std:  :set,  ordered  lexicograph¬ 
ically  on  the  arguments  of  the  atoms.  As  with  the  set  of  triples,  when  the  ordering 
is  not  suited  to  the  atom,  a  scan  is  performed.  Intermediate  relations  are  imple¬ 
mented  as  std:  : deques,  the  choice  of  which  was  determined  based  on  experience 
and  comparison  to  performance  using  std:  : vectors  and  std:  : lists.  Triples  are 
implemented  as  fixed  arrays  of  three  terms,  and  tuples  in  intermediate  results  and  of 
atoms  are  implemented  as  fixed  arrays  of  some  maximum  length  determined  based 
on  the  specific  ruleset  under  consideration.  (The  length  is  the  maximum  number  of 
variables  used  in  a  rule,  or  the  largest  predicate  arity,  whichever  is  larger.)  The  data 
is  initially  dictionary-encoded  (discussed  in  detail  in  appendix  A),  so  each  term  is 
represented  as  a  uniquely-assigned  64-bit  integer. 

Joins  are  performed  using  a  one-sided  index  join.  That  is,  of  the  two  relations 
to  be  joined,  the  smaller  one  is  indexed  on  the  values  of  the  join  variables,  and 
iterating  over  the  larger  relation,  the  index  is  probed  for  compatible  tuples.  The 
index  was  implemented  using  std:  :map,  and  so  letting  R  and  S  be  the  larger  and 
smaller  relations  (respectively)  to  be  joined,  building  the  index  takes  0(|Sj  •  Ig  |S|) 
time  and  probing  it  takes  0(|i?|  •  lg  151)  time. 

It  is  important  to  understand  that  although  the  operational  semantics  are  pre¬ 
scribed  in  the  form  of  forward-chained  inference,  that  does  not  mean  that  forward- 
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chaining  is  the  necessary  mechanism  by  which  inference  must  be  performed.  The 
findings  of  the  previous  chapters  apply  just  as  well  to  other  mechanisms  of  in¬ 
ference  inasmuch  as  those  mechanisms  effectively  produce  the  same  outcome  as 
forward-chaining20.  An  example  of  this  would  be  Datalog  inference.  Since  Data- 
log  programs  are  known  to  have  a  finite  closure,  then  both  forward-chaining  and 
backward-chaining  are  capable  of  effectively  providing  the  same  results.  From  the 
perspective  of  the  operational  semantics  in  section  3.1.2,  whichever  mechanism  is 
used  is  irrelevant,  as  long  as  from  the  perspective  of  the  user,  the  answers  come 
out  the  same  as  though  forward-chaining  was  performed  exactly  as  described  in  the 
operational  semantics.21 

5.1.2  Supercomputers  and  their  Configurations 

Two  supercomputers  are  used  in  this  evaluation.  The  first  is  a  SMP  system 
called  Mastiff  with  four  16-core  Opteron  6272  processors22,  512  GB  of  RAM,  and  2 
TB  of  high-speed  RAIDed  disk.  MPICH2  version  1.4.1  is  the  MPI  version  used  for 
evaluation  on  Mastiff.  The  operating  system  on  Mastiff  was  Ubuntu  12.10.  The  code 
was  compiled  with  the  -03  optimization  flag.  Additionally,  when  running  on  Mastiff, 
CPU  affinity  was  set  so  that  each  processor  was  allowed  to  run  on  a  dedicated  CPU 
to  prevent  overhead  of  context  switching  and  cache  misses  that  can  arise  when 
the  operating  system  reassigns  processes  to  different  CPUs.  For  example,  for  two 
processors,  processor  zero  is  assigned  to  CPU  zero,  and  processor  one  is  assigned  to 
CPU  32.  As  larger  numbers  of  processors  are  used,  caches  are  increasingly  shared. 
Cache  sharing  begins  after  four  processors. 

The  other  system  is  a  newly  installed  Blue  Gene/Q  [22]  at  the  CCNI.  Each 
node  of  the  Blue  Gene/Q  has  16  compute  cores  at  1.6  GHz  each  and  16  GB  of 
RAM.  Each  node  can  be  overcommitted  beyond  16  tasks  up  to  64  tasks,  although 
due  to  memory  constraints,  I  have  not  taken  advantage  of  overcommitting.  The 

20  Also,  if  the  distribution  scheme  is  not  a  replication  scheme,  then  each  processor  must  be  sure  to 
“filter  out”  inferences  according  to  6  in  the  distribution  scheme  as  previously  discussed  in  regards 
to  line  9  of  algorithm  3.  Such  does  not  apply  to  this  evaluation  since  replication  schemes  are  used. 

21  Of  course,  the  user  may  perceive  differences  in  performance,  but  again,  not  differences  in 
outcome. 

22Used  here  in  the  hardware  sense. 
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nodes  are  interconnected  by  a  5D  torus  network.  IBM’s  MPI  compiler  is  used 
for  compilation  with  the  -03  -qstrict  optimization  flags.  When  running  jobs 
with  srun,  —  runjob-opts=J  —  envs  MALL0C_MMAP_MAX_=0  BG_MAPC0MM0NHEAP=1  ’ 
was  used.  MALL0C_MMAP_MAX_=0  ensures  that  memory  does  not  need  to  be  “zeroed 
out”  when  deallocated;  BG_MAPC0MM0NHEAP=1  ensures  that  heap  allocation  is  uniform 
across  processors. 

For  both  supercomputers,  posixunemalign  is  used  for  memory  allocation  in 
place  of  malloc  or  similar  calls  (although  no  special  allocator  was  written  for  use 
with  STL  containers). 

5.1.3  Metrics 

This  evaluation  focuses  on  scalability  of  two  forms.  The  first  is  strong  scaling, 
that  is,  the  ability  to  reduce  the  execution  time  for  a  fixed  workload  by  adding 
more  processors.  This  is  the  traditional  perspective  of  scalability  from  the  parallel 
computing  community,  and  its  primary  metric  is  speedup  S(p)  =  where  Tj  is  the 
execution  time  for  a  single  processor  and  Tp  is  the  execution  time  for  p  processors. 
In  the  ideal  case,  S(p)  =  p,  although  in  practice  it  is  sometimes  possible  to  achieve 
superlinear  speedup  when  a  large  enough  number  of  processors  is  used  such  that 
each  processor’s  local  data  fits  into  a  lower-level  cache  than  with  a  smaller  number  of 
processors.  There  are  various  definitions  of  speedup  depending  on  the  interpretation 
of  T\.  In  absolute  speedup,  Tj  is  the  execution  time  for  a  single  processor  using 
the  best  sequential  algorithm.  However,  in  practice,  relative  speedup  is  often  used 
instead,  where  T\  is  determined  using  the  parallel  algorithm  on  a  single  processor. 
Relative  speedup  is  the  strong  scaling  metric  reported  herein,  although  there  are 
known  issues  with  relative  speedup  favoring  slower  algorithms.  To  remedy  this, 
execution  times  are  reported  as  well.  Efficiency  for  any  form  of  speedup  is  defined 
as  E(p)  =  ^0,  normalizing  the  achieved  speedup  to  a  value  between  zero  and  one23. 

An  often  overlooked  metric  that  will  be  used  in  this  evaluation  is  the  Karp- 
Flatt  metric  [25] .  The  Karp-Flatt  metric  is  the  empirically  determined  serial  fraction 
of  computation.  The  Karp-Flatt  metric  provides  a  good  indication  of  how  parallel  a 
23 Again,  with  exception  for  superlinear  speedup,  in  which  case  efficiency  can  be  greater  than 


one. 
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program  is,  which  is  not  always  directly  apparent  from  speedup  or  efficiency.  This 
handy  metric  will  prove  quite  useful  in  interpreting  the  evaluation  results. 

The  second  form  of  scaling  is  data  scaling,  a  perspective  I  have  proposed  in 
[7].  Data  scaling  is  a  form  of  weak  scaling.  In  weak  scaling,  the  workload  is  not 
fixed  like  it  is  in  strong  scaling.  Rather,  it  varies  in  some  way  as  a  function  of  the 
number  of  processors.  Gustafson  scaling  [67]  -  also  known  as  fixed-time  scaling  - 
scales  sequential  workload  with  processor  size.  In  this  case,  execution  time  7\  is 
fixed,  the  amount  of  work  that  p  processors  can  do  in  Tf  time  is  determined,  and 
then  Tp  is  the  time  it  takes  a  single  processor  to  do  the  same  amount  of  work  that 
p  processors  did  in  7\  time.  So-called  scaled  speedup  is  then  defined  as  S(p)  =  ^ , 
and  ideally  S(p)  =  p.  Another  form  of  weak  scaling  is  memory-bounded  scaling  [68] 
in  which  the  workload  grows  to  consume  the  full  memory  capacity  of  the  available 
processors.  This  is  very  close  to  the  idea  of  data  scaling,  but  it  differs  in  that  it 
focuses  on  memory  capacity  instead  of  data  quantity,  the  two  of  which  do  not  always 
coincide,  particularly  in  the  case  of  inference  when  one  does  not  know  a  priori  how 
large  the  closure  will  be. 

In  contrast,  data  scaling  is  concerned  with  the  ability  to  handle  larger  quan¬ 
tities  of  data  by  adding  more  processors.  In  the  case  of  inference  over  RDF  data, 
the  unit  of  data  would  be  the  RDF  triple.  Fixing  the  ratio  of  data  quantity  per 
processor  in  some  sensible  way  to  C  referred  to  as  the  processor  capacity,  Tp  is  then 
defined  as  the  time  it  takes  for  p  processors  to  execute  with  input  of  size  C  ■  p.  The 
metric  for  data  scaling  is  referred  to  as  growth  efficiency  Q(p)  =  |f,  and  ideally, 
G(p)  =  1,  and  in  the  worst  case,  G{p)  is  near  zero.  This  metric  is  referred  to  as  a 
form  of  efficiency  rather  than  speedup  because  its  values  range  between  zero  and 
one  instead  of  zero  and  p,  making  it  more  similar  to  efficiency  than  speedup. 

In  short,  the  metrics  for  this  evaluation  are  execution  time,  relative  speedup, 
efficiency,  the  Karp-Flatt  metric,  and  growth  efficiency. 

5.1.4  Rulesets 

Although  the  rulesets  in  tables  B.5  and  B.10  are  restricted  versions  of  the 
RDFS  and  OWL2RL  rulesets  (respectively)  for  which  parallel  inference  is  correct, 
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they  include  some  inference  rules  which,  in  practice,  are  troublesome  and,  in  some 
cases,  uninteresting.  Such  rules  have  been  excluded  from  this  evaluation.  The 
general  reason  is  memory  constraint.  Some  rules  lead  to  a  drastic  increase  in  the 
number  of  inferences,  but  many  of  these  inferences  appear  to  add  only  nominal  value 
and  are  likely  more  efficiently  (space-wise)  inferred  in  a  backward-chained  fashion. 
(This  has  also  been  noted  by  Urbani  et  al.  [20].) 

For  the  restricted  version  of  RDFS  inference,  the  Par-CoreRDFS  ruleset  is 
used  as  shown  in  table  4.2.  At  first,  exclusion  of  rules  rdfl,  rdfs4a,  and  rclfs4b 
was  necessary  (particularly  the  latter  two)  because  they  resulted  in  a  large  number 
of  ?xl  [rdf  :type->rdfs :  Resource]  and  ?xl  [rdf  :  type->rdf  :  Property]  triples, 
sometimes  resulting  in  exhaustion  of  memory.  Rules  rclf2  and  rdfsl  were  also  ex¬ 
cluded  clue  to  lack  of  support  of  the  pred:  is-literal-XMLLiteral  and 
pred:  is-literal-PlainLiteral  built-ins  (defined  in  [69]).  After  eliminating  those 
rules,  keeping  rules  rclfs8,  rdfslO,  rclfsl2,  rclfsl3,  and  the  axiomatic  triples  seemed 
like  an  arbitrary  compromise  between  the  RDFS  and  CoreRDFS  rulesets,  although 
it  would  be  more  complete  wrt  the  RDFS  semantics  [17].  Regardless,  it  seems  more 
useful  to  consider  the  slightly  restricted  version  of  the  CoreRDFS  ruleset  rather 
than  the  seemingly  arbitrarily  restricted  version  of  the  RDFS  ruleset  because  the 
CoreRDFS  ruleset  represents  the  minimum  RDFS  inference  support  of  most  (if  not 
all)  RDFS  inference  engines. 

Turning  to  OWL2RL,  the  Par-OWL2  ruleset  in  table  B.10  represents  a  re¬ 
stricted  version  of  the  OWL2RL  ruleset  that  is  amenable  to  parallel  inference.  When 
initially  attempting  inference  with  the  Par-OWL2  ruleset  on  the  2012  Billion  Triples 
Challenge  (BTC2012)  dataset,  it  was  quite  easy  to  exhaust  memory.  The  primary 
cause  was  rules  eq-ref,  eq-refl,  and  eq-ref2,  which  essentially  require  the  explicit 
statement  that  everything  is  the  same  as  itself.  For  the  evaluation,  these  rules  have 
been  excluded.  For  similar  reason,  rules  scm-cls2  and  scm-cls3  have  been  excluded. 
The  following  rules  were  excluded,  being  deemed  as  “uninteresting,”  in  order  to  fur¬ 
ther  reduce  memory  consumption  (although  the  savings  in  some  cases  are  admittedly 
quite  small):  scm-cls,  scm-clsl,  sem-op,  scm-opl,  sem-dp,  scm-dpl,  prp-ap-1,  prp- 
ap-c,  prp-ap-sa,  prp-ap-idb,  prp-ap-d,  prp-ap-pv,  prp-ap-bcw,  prp-ap-iw,  cls-thing, 


87 


and  cls-nothingl. 

The  semantic  purist  may  consider  these  exclusions  as  an  arbitrary  disregard 
for  completeness,  but  it  is  not  so  arbitrary.  This  is  simply  an  example  of  the  deli¬ 
cate  balance  between  theory  and  practice,  and  in  this  case,  the  particular  difficulty 
in  meeting  the  memory  requirements  for  inference  on  large  datasets.  It  is  not  so 
unusual  for  there  to  be  a  discrepancy  between  theory  and  practice.  For  example, 
from  the  point  of  view  of  theoretical  computational  complexity,  it  is  often  consid¬ 
ered  that  a  polynomial  time  algorithm  is  efficient,  but  in  practice,  if  the  highest 
power  of  the  polynomial  is  sufficiently  large,  or  if  the  size  of  the  input  is  sufficiently 
large,  the  algorithm  becomes  impractical.  Another  example  is  Nick’s  Class  (NC)  for 
parallel  complexity,  which  says  that  if  a  problem  can  be  solved  in  polylogarithmic 
time  with  a  polynomial  number  of  processors,  then  it  is  considered  to  be  “fast” 
in  parallel  [70].  In  practice,  though,  developers  tend  not  to  be  able  to  scale  the 
number  of  processors  polynomially.  Thus,  it  should  come  as  no  surprise  that  in  this 
case,  there  are  practical  considerations  (memory)  which  inhibit  demonstration  of  all 
theoretical  findings,  considering  the  realistic  resources  at  one’s  disposal.  Therefore, 
the  discrepancy  between  the  rulcsets  derived  in  chapter  4  and  the  ones  used  in  this 
evaluation  does  not  necessitate  a  flaw  or  lack  of  value  in  the  hirelings  of  the  previous 
chapters,  but  it  does  mean  that  the  value  of  the  rulesets  derived  in  chapter  4  for 
parallel  inference  is  not  demonstrated  by  this  evaluation.  Overcoming  the  memory 
constraints  to  empirically  demonstrate  the  value  for  the  rulesets  derived  in  chapter  4 
remains  as  future  work,  and  this  evaluation  instead  focuses  on  even  more  restricted 
versions  of  the  rulesets. 

The  Par-OWL2  rules  excluded  from  the  evaluation  are  marked  with  a  f  in 
table  B.10.  Note  that  the  same  replication  scheme  used  for  the  Par-OWL2  ruleset 
is  valid  for  the  further  restricted  version  since  the  same  3SAT  reduction  can  be  used 
as  for  deriving  the  Par-OWL2  ruleset,  but  this  time,  choosing  the  same  assignment 
of  variables  with  the  exception  that,  e.g.,  x(echref)  is  set  to  one  to  indicate  the 
elimination  of  rule  eq-ref.  Doing  so  still  results  in  an  assignment  of  variables  that 
satisfies  the  3SAT  formula,  which  is  clear  by  inspection  of  the  example  in  section 
4.2.2. 


To  summarize,  there  are  two  rulesets  used  in  this  evaluation:  (1)  the  Par- 
CoreRDFS  rules  in  table  4.2,  and  (2)  the  rules  not  marked  with  f  in  table  B.10. 
The  latter  will  be  referred  to  as  the  Par-MemOWL2  ruleset. 

5.1.5  Datasets 

This  evaluation  uses  two  datasets  that  represent  different  ends  of  the  spectrum 
in  terms  of  difficulty.  The  first  dataset,  referred  to  herein  as  LUBM10K,  is  the 
dataset  generated  using  the  Lehigh  University  Benchmark  (LUBM)  [23]  dataset 
generator  to  generate  synthetic  data  for  10,000  universities.  When  generating  the 
dataset,  a  random  seed  of  zero  was  used  (which  is  the  common  practice). 

LUBM  is  one  of  the  earliest,  synthetic,  RDF  dataset  generators  used  for  perfor¬ 
mance  evaluation  of  OWL  reasoning  systems.  For  this  reason,  it  useful  for  comparing 
with  previous  systems.  However,  LLIBM  data  is  unrealistic.  Relative  to  real-world 
datasets,  there  is  very  little  data  skew,  and  the  data  is  quite  “clean”  (e.g.,  no  chance 
of  “ontology  hijacking”  [71]).  For  the  purposes  of  this  evaluation,  though,  it  repre¬ 
sents  a  necessary  test  for  scalability.  That  is,  if  inference  is  not  scalable  on  LLIBM 
data,  then  it  will  very  likely  not  be  scalable  for  real-world  datasets. 

The  other  dataset  used  in  this  evaluation  is  the  2012  Billion  Triples  Challenge 
(BTC2012)  dataset.  The  BTC2012  dataset  is  a  set  of  RDF  quads  crawled  from  the 
Web  for  the  purposes  of  a  challenge  in  which  submissions  compete  for  the  designation 
of  champion  as  determined  by  a  committee  who  decides,  more  or  less  (in  my  own 
determination),  which  submission  is  the  most  interesting  considering  the  complexity 
of  the  dataset.  The  important  distinction  of  the  Billion  Triples  Challenge  as  opposed 
to  the  Open  Track  Challenge  is  that  the  crawled  dataset  must  be  used,  and  that  is 
indeed  the  true  challenge.  The  BTC  datasets  are  well-known  for  having  data  that 
are  troublesome  (in  terms  of  semantics  [48]  and  performance  [44])  for  inference,  have 
high  data  skew  [44,  47,  72]  and  even  syntactic  problems  [73] .  For  the  purposes  of  this 
evaluation,  BTC2012  represents  a  sufficient  test  for  scalability.  That  is,  if  inference 
is  scalable  on  BTC2012,  then  it  will  very  likely  be  scalable  for  other  real-world 
datasets. 

These  datasets  have  been  preprocessed  prior  to  inference,  which  is  common  in 
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evaluations  like  this  (e.g.,  see  [51]).  There  are  three  preprocessing  steps  for  strong 
scaling,  and  there  is  an  additional  preprocessing  step  for  data  scaling.  The  first  is 
normalization,  applied  only  to  BTC2012.  The  second  is  LZO  compression,  applied 
to  both  datasets.  The  third  is  dictionary  encoding,  applied  to  both  datasets.  The 
fourth  is  replication  of  triples  as  required  for  correct  parallel  inference.  In  the  strong 
scaling  evaluations,  this  step  is  part  of  the  overall  inference  process  and  this  is  not 
considered  preprocessing.  However,  in  order  to  facilitate  data  scaling  (the  input 
size  increases  linearly  with  the  number  of  processors),  replication  of  these  triples  is 
performed  prior  to  inference  as  a  preprocessing  step. 

All  steps  can,  in  theory,  be  performed  in  parallel,  and  the  code  has  been  written 
to  support  it.  However,  given  limited  availability  of  supercomputers,  normalization 
and  LZO  compression  were  performed  on  a  single  machine,  specifically  a  Mac  mini 
server  with  10  GB  of  RAM  and  768  GB  of  disk,  256  GB  of  which  was  a  solid 
state  drive.  With  such  limited  resources  and  such  large  datasets,  normalization  and 
LZO  compression  took  several  days  for  a  single  dataset.  Dictionary  encoding  and 
replication,  on  the  other  hand,  were  performed  on  supercomputers.  The  rest  of  this 
section  is  dedicated  to  describing  preprocessing. 

In  the  case  of  BTC2012,  the  quad  position  was  first  stripped  out  to  render 
RDF  triples,  and  then  the  dataset  was  “normalized” 24  as  in  [73]25.  Normalization 
includes  the  following: 

•  removing  lines  (which  should  correspond  to  RDF  triples)  that  are  clearly  in¬ 
valid  in  N-Triplcs  [75],  which  includes26: 

—  multiple  triples  on  one  line, 

—  lines  with  incorrectly  delimited  RDF  terms  (e.g.,  an  IRI  starting  with  < 
but  not  ending  with  >); 

24In  previous  work  [21,  74],  I  had  often  used  a  naive  and  simplistic  N-triples  parser  which 
worked  well  under  the  assumption  that  the  data  was  correctly  formatted  as  N-triples.  However, 
this  became  less  useful  in  later  BTC  datasets. 

25Note  that  although  there  were  initially  some  bugs  as  reported  in  [73],  they  have  been  worked 
out  prior  to  this  evaluation. 

26Note  that  non-alphanumeric  blank  node  labels  were  tolerated. 
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•  removal  of  lines  containing  invalid  UTF-8  sequences27; 

•  removal  of  lines  with  invalid  Unicode  codepoints  according  to  UCS  6.2.0  [77]; 

•  removal  of  lines  with  syntactically  invalid  IRIs  according  to  RFC  3987  [55] ; 

•  removal  of  lines  with  invalid  language  tags  according  to  RFC  5646  [78]; 

•  normalization  of  IRIs  according  to  RFC  3987  [55],  which  includes: 

—  percent- tmenco ding  any  unnecessarily  percent-encoded  characters, 

—  NFC  normalization  [79], 

—  path  resolution  (e.g.,  /a/,  ,/b  becomes  /b), 

—  lower-casing  the  scheme  and  host; 

•  NFC  normalization  of  the  lexical  representations  of  RDF  literals; 

•  normalization  of  language  tags  according  to  RFC  5646  [78] . 

The  BTC2012  dataset  was  then  deduplicated  (that  is,  duplicate  triples  were  re¬ 
moved).  No  such  normalization  was  performed  on  LUBM10K. 

It  is  important  to  note  the  distribution  of  the  data  in  the  N-Triples  hies  at 
this  point.  The  LUBM10K  dataset  is  ordered  as  generated  (triples  are  naturally 
grouped  by  subject)  and  actually  contains  a  very  small  fraction  of  duplicates.  The 
LUBM  ontology  is  at  the  beginning  of  the  hie.  The  BTC2012  dataset  is  sorted  as  a 
result  of  the  sequential  deduplication  process.  These  details  will  be  important  for  a 
discussion  of  load-balancing  in  section  5. 2. 1.1. 

After  normalization,  all  datasets  were  LZO-compressed  so  that  the  data  could 
ht  within  the  newly  instituted  disk  quotas  at  the  Computational  Center  for  Nan¬ 
otechnology  Innovations  (CCNI)  [80].  LZO  compression  is  a  block  compression 
algorithm,  and  the  source  code  from  the  LZO  2.06  library  [81]  was  used.  The  LZO- 
compression  approach  from  [72]  was  used  to  directly  compress  the  N-triples  data 
(i.e.,  there  was  no  syntactic  compression  in  the  form  of  Sterno).  A  four  MB  block 

27Although  the  BTC2012  dataset  is  provided  as  (compressed)  N-Quads  [76]  files,  which  requires 
US- ASCII  encoding,  UTF-8  sequences  still  occur  in  the  form  of  escaped  sequences. 
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Table  5.1:  Dataset  Sizes 


#Bytes 

Dataset 

^Triples 

NT 

LZO 

Encoded 

Data 

Index 

Data 

Diet. 

LUBM10K 

1,334,681,485 

224  GB 

15.1  GB 

448  KB 

29.8  GB 

23.0  GB 

BTC2012 

1,056,171,751 

172  GB 

17.8  GB 

344  KB 

23.6  GB 

33.4  GB 

Table  5.2:  Closure  Sizes  in  Number  of  (Unique)  Triples 


Dataset 

Par-CoreRDFS 

Par-MemOWL2 

LUBM10K 

1,667,939,414 

1,950,780,866 

BTC2012 

2,429,721,741 

(insufficient  memory) 

size  was  used  with  the  LZ01X  algorithm,  and  blocks  were  composed  such  that  each 
block  contained  complete  lines/triples.  That  is,  no  triples  were  split  across  blocks. 
During  compression,  a  list  of  offsets  into  the  compressed  hie  is  kept,  where  each 
offset  determines  the  beginning  of  a  compressed  block.  This  allows  for  parallel  de¬ 
compression.  Again,  as  with  the  normalization  process,  compression  could  in  theory 
have  been  done  in  parallel,  as  demonstrated  in  [72],  but  at  the  time,  it  was  more 
convenient  to  do  it  on  a  single  machine,  the  same  Mac  mini  server  used  for  nor¬ 
malization.  The  LZO-compressed  datasets  are  the  versions  that  are  deployed  to  the 
supercomputers.  There,  they  are  dictionary  encoded  in  parallel,  which  is  described 
in  detail  in  appendix  A. 

Table  5.1  shows  the  sizes  of  the  datasets  in  number  of  unique  triples,  bytes  in 
the  N-Triples  hie,  sizes  in  LZO-compressed  form,  and  sizes  in  dictionary  compressed 
form.  It  should  be  noted  that  the  LZO-compressed  sizes  for  LLIBM10K  are  for  the 
LLIBM10K  dataset  as  generated  (i.e. ,  including  duplicate  triples,  which  are  relatively 
few).  The  closure  sizes  are  given  in  table  5.2. 

Table  5.3  gives  the  times  to  dictionary  encode  the  datasets,  and  table  5.4 
gives  the  times  to  replicate  the  appropriate  triples  prior  to  data  scaling.  Times  are 
reported  in  the  format  minutes: seconds.  For  the  Blue  Gene/Q,  the  same  number 
of  nodes  was  used  regardless  of  dataset,  specifically  2,048  nodes.  However,  for 
LLIBM10K,  16  processors  (or  tasks)  were  assigned  per  node,  and  for  BTC2012,  one 
processor  (or  task)  was  assigned  per  node.  The  times  reported  are  overall  times, 
including  disk  I/O,  with  the  exception  of  the  “Repl”  column  in  table  5.4  which 
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Table  5.3:  Dictionary  Encoding  Times 


System 

Dataset 

Ruleset 

P 

Min 

Avg 

Max 

Dev 

Mastiff 

LUBM10K 

Par-CoreRDFS 

64 

27:11 

28:03 

28:33 

0:18 

Par-MemOWL2 

64 

28:57 

29:58 

30:27 

0:15 

BTC2012 

Par-CoreRDFS 

64 

38:32 

43:42 

46:45 

1:53 

Blue  Gene/Q 

LUBM10K 

Par-CoreRDFS 

32,768 

0:52 

1:03 

1:14 

0:02 

Par-MemOWL2 

32,768 

0:50 

1:00 

1:10 

0:02 

BTC2012 

Par-CoreRDFS 

2,048 

9:32 

9:37 

9:56 

0:02 

Table  5.4:  Replication  Preprocessing  Times  Prior  to  Data  Scaling 


System 

Dataset 

Ruleset 

P 

Min 

Avg 

Max 

Dev 

Repl 

Mastiff 

LUBM10K 

Par-CoreRDFS 

64 

1:20 

1:33 

1:40 

0:06 

0:14 

Par-MemOWL2 

64 

1:55 

2:05 

2:11 

0:06 

0:50 

BTC2012 

Par-CoreRDFS 

64 

1:15 

1:24 

1:29 

0:03 

0:38 

Blue  Gene/Q 

LUBM10K 

Par-CoreRDFS 

32,768 

0:11 

0:20 

0:24 

0:02 

0:07 

Par-MemOWL2 

32,768 

0:11 

0:20 

0:26 

0:02 

0:07 

BTC2012 

Par-CoreRDFS 

2,048 

0:06 

0:07 

0:08 

0:00 

0:05 

reports  the  actual  time  spent  in  performing  replication  (which  includes  querying 
out  the  triples  to  be  replicated  and  the  subsequent  MPI  calls). 

Concerning  dictionary  encoding,  it  is  important  to  note  that  caching  of  remote 
lookups  (described  in  appendix  A)  is  turned  on  for  the  Bine  Gene/Q  and  turned  off 
for  Mastiff. 

Note  that  no  metrics  are  reported  for  Par-MemOWL2  inference  on  BTC2012. 
The  reason  is  that  preliminary  evaluation  using  the  Par-MemOWL2  ruleset  with 
the  BTC2012  dataset  caused  memory  to  become  exhausted,  even  with  64  GB  per 
processor  (eight  processors  on  Mastiff).  Therefore,  it  appeared  infeasible  to  perform 
such  an  evaluation.  This  is  further  discussed  in  section  5.2. 


5.2  Scalability  of  Parallel  Inference 

Parallel  inference  consists  of  six  phases. 

1.  The  processors  collectively  read  the  ruleset  (which  has  already  been  encoded 
beforehand).  This  step  usually  takes  subsecond  time  and  so  is  not  explicitly 
included  in  the  reported  metrics. 

2.  Load.  The  processors  each  read  the  dictionary-encoded  triples  from  their 
separate  hies  and  load  the  encoded  triples  into  a  std:  :set. 
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3.  Repl.  The  processors  replicate  the  data  according  to  the  predetermined  pat¬ 
tern  assignment.  This  is  done  by  first  collectively  reading  the  encoded  patterns 
from  a  file.  Then,  the  triples  matching  the  patterns  are  queried  out  of  the  lo¬ 
cal  dataset  and  written  into  a  buffer.  The  processors  broadcast  the  triples  to 
each  other  using  MPI :  :  Intracomm:  :  Allgatherv  and  then  load  the  replicated 
triples  into  their  std:  :  sets.  This  step  does  nothing  except  perform  a  barrier 
in  the  data  scaling  evaluations  because  in  that  case,  replication  is  performed 
as  a  preprocessing  step. 

4.  Infer.  The  processors  perform  inference  independently  of  each  other. 

5.  Uniq.  The  processors  deduplicate  the  data  by  hash-distributing  the  triples. 

6.  Write.  The  processors  write  their  local  triples  out  to  individual  files. 

Metrics  are  reported  for  each  of  the  phases  as  appropriate  (except  for  reading  the 
rules,  as  mentioned)  and  for  the  overall  execution.  The  replication  phase  and  the 
deduplication  phase  act  (in  part)  as  barriers.  This  adds  some  complexity  to  measur¬ 
ing  execution  time.  For  example,  the  time  for  the  replication  phase  of  a  processor 
begins  directly  after  the  load  phase  finishes.  Therefore,  if  the  load  phase  is  un¬ 
balanced,  one  processor  could  report  a  longer  time  in  the  replication  phase  simply 
because  it  is  waiting  on  other  processors  and  not  because  any  actual  work  is  nec¬ 
essary.  Therefore,  the  processor  reporting  the  most  time  spent  in  the  replication 
phase  is  often  (if  not  always)  the  processor  reporting  the  least  time  spent  in  the 
load  phase.  For  this  reason,  the  sum  of  the  maximum  times  for  each  phase  is  likely 
greater  than  the  overall  maximum  time. 

Therefore,  when  computing  metrics  for  each  phase,  for  communication  phases 
(repl,  uniq),  the  minimum  time  is  used,  and  for  other  phases  (load,  infer,  write), 
the  maximum  time  is  used.  For  the  overall  time,  the  maximum  time  is  used  since 
it  reflects  the  perceived  time  of  the  user.  The  same  holds  for  charts  of  execution 
times. 
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5.2.1  Strong  Scaling 

This  section  is  the  strong  scaling  portion  of  the  evaluation.  Recall  that  strong 
scaling  is  concerned  with  reducing  the  execution  time  for  a  fixed  workload  by  adding 
more  processors.  First,  the  results  for  Mastiff  are  considered,  and  then  the  Blue 
Gene/Q  is  considered.  Focus  should  be  given  to  the  inference  phase  because  paral¬ 
lelization  of  inference  has  been  the  focus  of  this  thesis.  Metrics  for  other  phases  are 
included  (particularly  for  the  overall  process)  in  order  to  provide  indication  of  other 
barriers  to  scalability  that  are  encountered  in  practice. 

5. 2. 1.1  Mastiff 

Table  5.5  gives  the  metrics  for  Par-CoreRDFS  inference  on  LUBM10K  using 
Mastiff.  The  number  of  processors  is  scaled  by  powers  of  two  from  one  to  64.  While 
the  table  provides  specific  details,  trends  are  more  readily  observed  in  figure  5.1. 

Figure  5.1a  shows  the  execution  times.  Note  that  the  time  for  the  repl  phase  is 
not  shown  because  it  is  lower  than  the  minimum  y-axis  value.  It  is  excluded  to  give 
more  clarity  to  the  phases  that  dominate  the  overall  process.  All  phases  decrease  in 
time  with  number  of  processors.  The  total  time  does  not  change  significantly  from 
one  to  two  processors  because  adding  multiple  processors  introduces  overhead  that 
is  non-existent  for  a  single  processor.  In  this  case,  the  uniq  (deduplication)  phase 
is  the  specific  reason  for  this  initial  lack  of  scalability.  Also  note  that  starting  at  16 
processors,  disk  contention  when  writing  the  output  increases. 

Figures  5.1b  and  5.1c  show  the  relative  speedup  and  efficiency  curves  (respec¬ 
tively).  Initially,  speedup  for  the  infer  phase  is  superlinear  up  to  eight  processors. 
This  is  likely  due  to  the  way  in  which  CPU  affinity  is  set  as  described  in  section 
5.1.2,  reducing  cache  contention  among  processors.  Thus,  adding  more  processors 
with  a  fixed  workload  causes  there  to  be  less  data  per  processor,  and  each  processor 
can  better  utilize  its  cache.  This  is  most  effective  at  four  processors  when  an  aston¬ 
ishing  efficiency  of  1.220  is  achieved  for  the  infer  phase.  At  eight  processors,  cache 
sharing  begins,  and  so  efficiency  begins  to  drop. 

However,  after  16  processors,  efficiency  begins  to  plummet,  reaching  a  low 
of  0.548  at  64  processors.  The  question  remains  as  to  whether  this  efficiency  is 
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Table  5.5:  Strong  Scaling,  Mastiff,  Par-CoreRDFS,  LUBM10K 


p 

Task 

Min 

Avg 

Max 

Dev 

Sdup 

Eff 

KF 

Load 

44:08 

44:08 

44:08 

0:00 

1.000 

1.000 

- 

Repl 

0:00 

0:00 

0:00 

0:00 

- 

- 

- 

1 

Infer 

2:45:22 

2:45:22 

2:45:22 

0:00 

1.000 

1.000 

- 

Uniq 

0:00 

0:00 

0:00 

0:00 

- 

- 

- 

Write 

9:48 

9:48 

9:48 

0:00 

1.000 

1.000 

- 

Total 

3:39:18 

3:39:18 

3:39:18 

0:00 

1.000 

1.000 

- 

Load 

20:09 

20:19 

20:30 

0:10 

2.153 

1.076 

-0.071 

Repl 

0:00 

0:10 

0:21 

0:10 

- 

- 

- 

o 

Infer 

1:16:36 

1:16:36 

1:16:36 

0:00 

2.159 

1.079 

-0.074 

z 

Uniq 

1:09:05 

1:28:25 

1:47:45 

19:20 

- 

- 

- 

Write 

6:02 

6:05 

6:09 

0:03 

1.593 

0.797 

0.255 

Total 

2:52:20 

3:11:36 

3:30:53 

19:16 

1.040 

0.520 

0.923 

Load 

9:27 

9:31 

9:39 

0:05 

4.573 

1.143 

-0.042 

Repl 

0:00 

0:08 

0:12 

0:05 

- 

- 

- 

A 

Infer 

32:53 

33:27 

33:54 

0:27 

4.878 

1.220 

-0.060 

4 

Uniq 

34:37 

35:04 

35:40 

0:28 

- 

- 

- 

Write 

3:03 

3:07 

3:13 

0:04 

3.047 

0.762 

0.104 

Total 

1:21:12 

1:21:18 

1:21:23 

0:04 

2.695 

0.674 

0.161 

Load 

4:23 

4:41 

5:20 

0:22 

8.275 

1.034 

-0.005 

Repl 

0:00 

0:39 

0:57 

0:22 

- 

- 

- 

Q 

Infer 

16:28 

18:15 

20:00 

1:39 

8.268 

1.034 

-0.005 

O 

Uniq 

19:46 

21:34 

23:17 

1:35 

- 

- 

- 

Write 

1:27 

1:37 

2:00 

0:13 

4.900 

0.613 

0.090 

Total 

46:32 

46:46 

47:21 

0:20 

4.631 

0.579 

0.104 

Load 

2:13 

2:19 

2:51 

0:12 

15.485 

0.968 

0.002 

Repl 

0:00 

0:32 

0:38 

0:12 

- 

- 

- 

16 

Infer 

7:34 

8:26 

10:16 

0:58 

16.107 

1.007 

0.000 

Uniq 

10:36 

12:27 

13:18 

0:56 

- 

- 

- 

Write 

0:48 

0:55 

1:12 

0:07 

8.167 

0.510 

0.064 

Total 

24:31 

24:39 

25:05 

0:10 

8.743 

0.546 

0.055 

Load 

1:19 

1:25 

1:50 

0:09 

24.073 

0.752 

0.011 

Repl 

0:00 

0:25 

0:31 

0:09 

- 

- 

- 

32 

Infer 

4:06 

4:35 

6:04 

0:39 

27.258 

0.852 

0.006 

Uniq 

6:42 

8:12 

8:40 

0:36 

- 

- 

- 

Write 

0:29 

0:50 

1:07 

0:15 

8.776 

0.274 

0.085 

Total 

15:05 

15:27 

15:43 

0:15 

13.953 

0.436 

0.042 

Load 

0:52 

0:56 

1:20 

0:09 

33.100 

0.517 

0.015 

Repl 

0:00 

0:24 

0:28 

0:09 

- 

- 

- 

64 

Infer 

2:20 

2:48 

4:43 

0:47 

35.060 

0.548 

0.013 

Uniq 

4:27 

6:23 

6:50 

0:43 

- 

- 

- 

Write 

0:24 

0:41 

0:51 

0:07 

11.529 

0.180 

0.072 

Total 

10:54 

11:13 

11:32 

0:09 

19.014 

0.297 

0.038 
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(b)  Relative  Speedup 


(d)  Karp-Flatt  Metric 


Figure  5.1:  Strong  Scaling,  Mastiff,  Par-CoreRDFS,  LUBM10K 


considered  good  for  the  problem  at  hand.  The  Karp-Flatt  metric  can  be  used  to 
answer  this  question. 

Figure  5.  Id  shows  the  Karp-Flatt  metric  for  Par-CoreRDFS  inference  on 
LUBM10K  using  Mastiff.  Recall  that  the  Karp-Flatt  metric  is  the  experimentally 
determined  serial  fraction  of  computation.  Thus,  zero  is  completely  parallel,  and 
one  is  completely  serial.  At  two  processors,  the  Karp-Flatt  metric  for  the  infer  phase 
is  initially  negative  due  to  superlincar  speedup.  The  decrease  in  parallelization  due 
to  cache  contention  is  apparent  by  the  sudden  jump  in  the  metric  between  four  and 
eight  processors.  The  Karp-Flatt  metric  then  steadily  increases  at  a  very  slow  rate 
to  a  value  of  0.013  at  64  processors.  From  this,  it  can  be  concluded  that  although 
efficiency  is  only  0.548,  it  cannot  be  made  much  better  than  that,  and  the  decrease 
in  efficiency  is  largely  inherent  to  the  problem  (that  is,  to  Par-CoreRDFS  inference 
on  LUBM10K). 
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The  really  interesting  question,  though,  is  what  makes  up  the  1.3%  of  the  com¬ 
putation  that  is  effectively  serial?  It  cannot  be  communication  because  inference 
is  embarrassingly  parallel.  There  are  three  apparent  (and  non-mutually-exclusive) 
explanations.  One  explanation  is  contention  for  shared  resources,  like  cache  con¬ 
tention.  Another  explanation  is  redundant  work,  of  which  there  is  certainly  some 
small  amount  due  to  rules  scm-sco  and  scm-spo,  but  these  rules  represent  a  fixed 
amount  of  redundant  work  and  cannot  account  for  the  increase  in  the  metric.  The 
more  obvious  explanation  is  load-balancing  problems  which  can  be  easily  observed 
from  the  standard  deviation  in  the  processor  times  reported  in  table  5.5.  At  64 
processors,  the  fastest  processor  takes  2:20  and  the  slowest  processor  takes  4:43  to 
perform  inference,  with  a  standard  deviation  of  0:47.  Certainly,  load-balancing  is 
an  issue. 

A  common  and  simple  approach  to  load-balancing  is  static  random  allocation. 
In  static  random  allocation,  each  unit  of  data  (in  this  case,  each  triple  that  can  be 
placed  arbitrarily)  is  randomly  assigned  to  a  single  processor.  Preliminary  exper¬ 
iments  with  static  random  allocation  revealed  some  unexpected  and  non-intuitive 
findings.  The  goal  of  load-balancing  is  to  decrease  the  maximum  execution  time 
by  decreasing  the  standard  deviation  (i.e.,  spreading  out  work  more  evenly  across 
processors).  However,  my  preliminary  experiments  showed  that  static  random  allo¬ 
cation  indeed  decreased  standard  deviation,  but  also  increased  the  maximum  execu¬ 
tion  time  on  larger  numbers  of  processors.  Thus,  the  effect  is  the  opposite  of  what 
one  would  expect.  Additionally,  memory  consumption  was  much  higher,  so  much 
so  that  64  processors  caused  all  512  GB  of  Mastiff  to  be  exhausted. 

After  some  investigation,  there  is  a  logical  explanation.  Recall  from  section 
5.1.5  that  triples  in  the  LUBM10K  dataset  are  naturally  grouped  by  subject.  Con¬ 
sider  the  following  example  with  subject  grouping,  based  on  characteristics  of  real 
LUBM  data  (using  the  Turtle  syntax  instead  of  RIF  frames). 

ub:teacherOf  rdfs: domain  ub: Faculty  . 
ub: Faculty  rdf s : subClassOf  ub: Employee  . 

_:prof0  ub:teacherOf  _:courseO  . 

_:profl  ub:teacherOf  _:coursel  . 
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_:profl  ub:teacherOf  _:course2  . 

With  the  data  in  this  order,  two  processors  would  be  assigned  data  as  follows. 

#  Processor  0 

ub:teacherOf  rdfs: domain  ub: Faculty  . 
ub: Faculty  rdf s : subClassOf  ub: Employee  . 

_:profO  ub:teacherOf  _:courseO  . 

#  Processor  1 

ub:teacherOf  rdfs: domain  ub: Faculty  . 
ub: Faculty  rdf s : subClassOf  ub: Employee  . 

_:profl  ub:teacherOf  _:coursel  . 

_:profl  ub:teacherOf  _:course2  . 

Then  processors  will  draw  the  following  inferences. 

#  Processor  0 

_:profO  rdf: type  ub: Faculty  . 

_:profO  rdf: type  ub: Employee  . 

#  Processor  1 

_:profl  rdf: type  ub: Faculty  . 

_:profl  rdf: type  ub: Employee  . 

Now  consider  static  random  allocation.  It  is  then  possible  that  processors  can 
be  assigned  data  as  follows. 

#  Processor  0 

ub:teacherOf  rdfs: domain  ub: Faculty  . 
ub: Faculty  rdf s : subClassOf  ub: Employee  . 

_:profl  ub:teacherOf  _:coursel  . 

#  Processor  1 

ub:teacherOf  rdfs: domain  ub: Faculty  . 
ub: Faculty  rdf s : subClassOf  ub: Employee  . 

_:profO  ub:teacherOf  _:courseO  . 

_:profl  ub:teacherOf  _:course2  . 
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Then  processors  will  draw  the  following  inferences. 


#  Processor  0 
_:profl  rdf: type 
_:profl  rdf: type 

#  Processor  1 
_:profO  rdf: type 
_:profO  rdf: type 
_:profl  rdf: type 
_:profl  rdf: type 


ub: Faculty  . 
ub : Employee 

ub: Faculty  . 
ub : Employee 
ub: Faculty  . 
ub : Employee 


There  are  two  important  differences  to  note.  Firstly,  even  though  the  exact 
same  triples  are  inferred,  there  are  more  replicated  inferences  under  static  ran¬ 
dom  allocation  than  under  subject  grouping.  This  explains  the  increased  memory 
consumption.  Secondly,  there  is  more  redundant  work.  Both  processors  had  to 
infer  by  rule  scm-sco  that  since  _:profl  rdf : type  ub: Faculty  and  ub: Faculty 
rdf s : subClassOf  ub: Employee,  then  _:profl  rdf : type  ub: Employee.  This  ex¬ 
plains  the  increased  execution  time.  This  shows  that  due  to  rules  prp-dom  and 
scm-sco,  scattering  triples  with  the  same  subject  across  processors  actually  results 
in  more  work.  Presumably,  it  seems  that  the  same  could  be  said  for  prp-rng  and 
scm-sco  when  triples  with  the  same  object  are  scattered  across  processors. 

In  short,  while  one  would  typically  expect  static  random  allocation  to  decrease 
maximum  execution  time  by  more  evenly  distributing  data  among  processors,  this 
assumption  does  not  hold  for  general  inference.  In  the  case  of  Par-CoreRDFS  in¬ 
ference  on  LUBM10K,  static  random  allocation  actually  created  more  redundant 
work.  It  is  for  this  reason  that  I  have  left  the  files  in  their  natural  ordering  without 
any  initial  redistribution  (except  the  natural  side  effect  of  the  preprocessing  or  data 
generation).  This  suggests  that  load-balancing  of  parallel  inference  is  non-trivial. 
Determining  good  load-balancing  methods  for  parallel  inference  is  left  as  future 
work.  The  reader  should  keep  this  caveat  in  mind  when  interpreting  all  results  in 
this  chapter  and  remember  that  this  is  the  reason  that  explicit  load-balancing  has 
been  avoided. 

Urbani  has  previously  observed  this  effect  in  his  MapReduce-based  inference 
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engine  as  noted  in  section  2.2.4  of  [51].  He  found  that  this  effect  made  Map-side 
joins  a  less  practical  solution  than  using  the  Map  phase  for  distribution  and  then 
performing  joins  in  the  Reduce  phase,  specifically  in  the  context  of  RDFS-based 
inference.  As  a  solution,  Urbani  distributes  non-ontology  triples  based  on  terms  in 
the  triples  that  will  be  used  to  bind  variables  in  rule  actions.  In  this  way,  the  Reduce 
phase  will  implicitly  perform  some  (but  not  necessarily  all)  deduplication  of  infer¬ 
ences.  Regardless,  part  of  his  evaluation  suggests  some  load-balancing  issues  remain 
for  real-world  datasets,  which  is  to  be  expected  due  to  skew  in  the  distribution  of 
terms  in  real-world  RDF  data. 

Returning  to  the  discussion  of  the  Karp-Flatt  metric  for  Par-CoreRDFS  in¬ 
ference  on  LUBM10K  using  Mastiff,  as  shown  in  figure  5.  Id,  regardless  of  the  poor 
load-balancing,  the  infer  phase  is  almost  completely  parallel.  The  steady  (but  slow) 
increase  in  the  metric  from  eight  to  64  processors  can  be  attributed  to  increased 
contention  for  shared  resources,  poor  load-balancing,  and  redundant  work. 

Now  consider  Par-MemOWL2  inference  on  LUBM10K  using  Mastiff.  The 
specific  metrics  are  given  in  table  5.6,  but  trends  can  be  better  observed  in  figure 
5.2.  The  execution  times  are  given  in  figure  5.2a.  Note  that  unlike  in  Par-CoreRDFS 
inference,  the  repl  phase  takes  a  significant  amount  of  time  that  decreases  with 
the  number  of  processors.  This  is  due  to  an  inefficiency  in  the  way  in  which  the 
data  to  be  replicated  is  queried  out  prior  to  actual  replication  (i.e.,  the  time  is 
not  a  reflection  of  the  cost  of  communication).  In  Par-CoreRDFS,  the  triples  to  be 
replicated  were  looked  up  directly  via  an  index,  but  here,  they  are  scanned  out.  This 
is  an  unintended  side  effect  caused  by  a  combination  of  the  naive  inference  engine 
and  the  way  in  which  replication  patterns  were  specified  (in  rule  hies;  see  section 
B.3)  for  Par-MemOWL2.  Regardless,  the  repl  phase  still  takes  the  least  amount  of 
time,  and  so  this  inefficiency  should  not  heavily  impact  the  overall  trend. 

The  speedup  and  efficiency  curves  in  figures  5.2b  and  5.2c,  respectively,  show  a 
similar  trend  as  with  Par-CoreRDFS  inference  on  LUBM10K.  Superlinear  speedup 
(for  the  infer  phase)  is  achieved  on  lower  numbers  of  processors  due  to  little  or  no 
cache  contention,  and  then  for  32  and  64  processors,  efficiency  decreases  significantly. 
At  64  processors,  efficiency  for  the  infer  phase  is  0.676.  This  brings  up  an  inter- 
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Table  5.6:  Strong  Scaling,  Mastiff,  Par-MemOWL2,  LUBM10K 


p 

Task 

Min 

Avg 

Max 

Dev 

Sdup 

Eff 

KF 

Load 

43:30 

43:30 

43:30 

0:00 

1.000 

1.000 

- 

Repl 

0:00 

0:00 

0:00 

0:00 

- 

- 

- 

1 

Infer 

10:58:30 

10:58:30 

10:58:30 

0:00 

1.000 

1.000 

- 

Uniq 

0:00 

0:00 

0:00 

0:00 

- 

- 

- 

Write 

10:31 

10:31 

10:31 

0:00 

1.000 

1.000 

- 

Total 

11:52:31 

11:52:31 

11:52:31 

0:00 

1.000 

1.000 

- 

Load 

20:13 

20:15 

20:18 

0:02 

2.143 

1.071 

-0.067 

Repl 

4:16 

4:18 

4:21 

0:02 

- 

- 

- 

o 

Infer 

4:54:01 

4:54:01 

4:54:01 

0:00 

2.240 

1.120 

-0.107 

z 

Uniq 

1:07:08 

1:07:18 

1:07:28 

0:10 

- 

- 

- 

Write 

6:19 

6:20 

6:21 

0:01 

1.656 

0.828 

0.208 

Total 

6:32:04 

6:32:13 

6:32:22 

0:09 

1.816 

0.908 

0.101 

Load 

9:26 

9:33 

9:38 

0:05 

4.516 

1.129 

-0.038 

Repl 

2:05 

2:10 

2:17 

0:05 

- 

- 

- 

A 

Infer 

2:16:42 

2:16:54 

2:16:58 

0:07 

4.808 

1.202 

-0.056 

4 

Uniq 

40:36 

43:03 

49:51 

3:56 

- 

- 

- 

Write 

3:24 

4:18 

6:35 

1:19 

1.597 

0.399 

0.501 

Total 

3:12:52 

3:15:58 

3:21:56 

3:42 

3.528 

0.882 

0.045 

Load 

4:26 

4:35 

5:20 

0:17 

8.156 

1.020 

-0.003 

Repl 

1:25 

2:10 

2:19 

0:17 

- 

- 

- 

Q 

Infer 

1:03:24 

1:15:06 

1:23:16 

7:04 

7.908 

0.989 

0.002 

O 

Uniq 

23:09 

31:22 

43:01 

7:02 

- 

- 

- 

Write 

1:47 

1:56 

2:20 

0:11 

4.507 

0.563 

0.111 

Total 

1:54:57 

1:55:09 

1:55:47 

0:16 

6.154 

0.769 

0.043 

Load 

2:14 

2:16 

2:19 

0:01 

18.777 

1.174 

-0.010 

Repl 

0:37 

0:40 

0:42 

0:01 

- 

- 

- 

16 

Infer 

31:10 

33:55 

37:37 

2:38 

17.506 

1.094 

-0.006 

Uniq 

12:32 

16:15 

18:59 

2:37 

- 

- 

- 

Write 

0:58 

1:01 

1:04 

0:02 

9.859 

0.616 

0.042 

Total 

54:03 

54:07 

54:13 

0:03 

13.142 

0.821 

0.014 

Load 

1:19 

1:20 

1:24 

0:01 

31.071 

0.971 

0.001 

Repl 

0:20 

0:24 

0:25 

0:01 

- 

- 

- 

32 

Infer 

17:21 

18:23 

21:29 

1:25 

30.652 

0.958 

0.001 

Uniq 

7:51 

10:57 

11:59 

1:25 

- 

- 

- 

Write 

0:36 

0:58 

1:18 

0:18 

8.090 

0.253 

0.095 

Total 

31:40 

32:02 

32:22 

0:18 

22.014 

0.688 

0.015 

Load 

0:51 

0:52 

1:00 

0:01 

43.500 

0.680 

0.007 

Repl 

0:15 

0:23 

0:24 

0:01 

- 

- 

- 

64 

Infer 

10:02 

11:13 

15:13 

1:44 

43.275 

0.676 

0.008 

Uniq 

5:10 

9:10 

10:21 

1:44 

- 

- 

- 

Write 

0:32 

0:47 

0:54 

0:05 

11.685 

0.183 

0.071 

Total 

22:11 

22:25 

22:32 

0:05 

31.621 

0.494 

0.016 
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esting  question  in  comparison  to  the  efficiency  achieved  with  the  Par-CoreRDFS 
ruleset,  which  was  0.548  at  64  processors.  How  is  it  that  inference  with  a  more 
complex  ruleset  like  Par-MemOWL2  can  be  more  efficient  than  with  a  simpler  rule- 
set  like  Par-CoreRDFS?  The  apparent  explanation  is  that  there  is  proportionally 
more  parallel  work  with  Par-MemOWL2  inference.  In  other  words,  even  though 
there  is  an  increase  in  the  sequential  work  cine  to  an  (effectively)  larger  ontology 
(i.e,  more  replicated  TBox  inference),  there  is  a  proportionally  larger  increase  in  the 
parallel  work  (i.e.,  even  more  distributed  ABox  inference).  This  is  confirmed  by  the 
Karp-Flatt  metric,  which  at  64  processors  is  0.008  for  Par-MemOWL2  and  0.013  for 
Par-CoreRDFS.  This  means  that,  non-intuitively,  parallel  Par-MemO WL2  inference 
is  more  scalable  (in  the  strong  scaling  sense)  than  parallel  Par-CoreRDFS  inference, 
at  least  for  LUBM  data. 

Again,  the  Karp-Flatt  metric  in  figure  5. 2d  reveals  that  Par-MemOWL2  in¬ 
ference  on  LUBM10K  is  almost  completely  parallel,  even  for  the  total  process.  The 
increase  in  the  metric  for  the  inference  phase  from  two  to  eight  processors  can  be 
explained  by  poorer  load-balancing.  After  eight  processors,  the  inference  phase 
appears  to  remain  nearly  completely  parallel. 

LUBM  data  is  unrealistic  and  “easy”  for  inference  relative  to  real-world 
datasets.  Thus,  the  results  that  have  been  shown  thus  far  give  an  indication  of 
the  scale  that  is  achievable  with  the  restricted  rulcsets  (Par-CoreRDFS  and  Par- 
MemOWL2)  in  an  ideal  scenario.  At  this  point,  I  turn  to  the  real-world,  BTC2012 
dataset,  which  represents  a  worst-case  scenario.  Recall  from  section  5.1.5  that  Par- 
MemOWL2  inference  is  not  performed  on  BTC2012  due  to  memory  exhaustion, 
although  in  light  of  the  recent  discussion  of  load-balancing,  it  may  not  be  entirely 
impossible  on  Mastiff  or  the  Blue  Gene/Q  if  load-balancing  can  be  improved.  An¬ 
other  possible  cause  is  an  excessively  large  ontology,  which  cannot  be  remedied  given 
the  embarrassingly  parallel  paradigm  held  fixed  in  this  evaluation.  As  mentioned, 
though,  load-balancing  techniques  are  left  as  future  work. 

The  metrics  for  Par-CoreRDFS  inference  on  BTC2012  (using  Mastiff)  are  given 
in  table  5.7.  In  figure  5.3a,  the  change  in  execution  times  are  visualized.  Note  that 
the  repl  phase  does  not  appear  because  the  time  is  so  low.  A  noticeable  differ- 
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(b)  Relative  Speedup 


(d)  Karp-Flatt  Metric 


Figure  5.2:  Strong  Scaling,  Mastiff,  Par-MemOWL2,  LUBM10K 


ence  between  Par-CoreRDFS  inference  on  LUBM10K  and  the  same  on  BTC2012 
is  that,  with  BTC2012,  the  infer  phase  dominates  the  total  execution  time.  With 
LUBM10K,  the  infer  phase  and  the  uniq  phase  took  about  the  same  amount  of  time. 
This  is  because  far  more  inferences  are  drawn  from  BTC2012  than  from  LUBM10K. 

Speedup  and  efficiency,  shown  in  figures  5.3b  and  5.3c  (respectively)  suggest 
significantly  poorer  scalability  (in  the  strong  scaling  sense)  for  the  infer  phase  than 
with  LUBM10K.  Superlinear  speedup  is  achieved  at  two  and  four  processors,  but 
that  ends  sharply  at  eight  processors.  Efficiency  drastically  decreases  thereafter, 
ending  at  a  low  0.245  at  64  processors.  The  question  remains,  though,  whether  this 
is  good  for  the  problem  at  hand. 

The  Karp-Flatt  metric  visualized  in  figure  5.3d  indicates  that,  indeed,  the 
efficiency  of  0.245  is  actually  good  at  64  processors  because  the  infer  phase  is  roughly 
95%  parallel  (i.e.  roughly  5%  serial).  As  previously  mentioned,  the  serial  portion 
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Table  5.7:  Strong  Scaling,  Mastiff,  Par-CoreRDFS,  BTC2012 


p 

Task 

Min 

Avg 

Max 

Dev 

Sdup 

Eff 

KF 

Load 

33:13 

33:13 

33:13 

0:00 

1.000 

1.000 

- 

Repl 

0:00 

0:00 

0:00 

0:00 

- 

- 

- 

1 

Infer 

10:03:05 

10:03:05 

10:03:05 

0:00 

1.000 

1.000 

- 

Uniq 

0:00 

0:00 

0:00 

0:00 

- 

- 

- 

Write 

11:38 

11:38 

11:38 

0:00 

1.000 

1.000 

- 

Total 

10:47:56 

10:47:56 

10:47:56 

0:00 

1.000 

1.000 

- 

Load 

14:09 

15:06 

16:04 

0:57 

2.067 

1.034 

-0.033 

Repl 

0:00 

0:57 

1:55 

0:57 

- 

- 

- 

o 

Infer 

4:36:03 

4:36:03 

4:36:03 

0:00 

2.185 

1.092 

-0.085 

z 

Uniq 

1:25:27 

1:25:27 

1:25:28 

0:00 

- 

- 

- 

Write 

7:12 

7:23 

7:35 

0:11 

1.534 

0.767 

0.304 

Total 

6:24:47 

6:24:59 

6:25:11 

0:12 

1.682 

0.841 

0.189 

Load 

6:16 

7:19 

9:18 

1:11 

3.572 

0.893 

0.040 

Repl 

0:01 

1:59 

3:03 

1:11 

- 

- 

- 

A 

Infer 

1:44:28 

2:11:49 

2:20:56 

15:47 

4.279 

1.070 

-0.022 

4 

Uniq 

53:34 

1:02:41 

1:30:00 

15:46 

- 

- 

- 

Write 

4:15 

4:36 

5:32 

0:33 

2.102 

0.526 

0.301 

Total 

3:28:05 

3:28:27 

3:29:27 

0:35 

3.093 

0.773 

0.098 

Load 

2:35 

3:32 

4:39 

0:49 

7.143 

0.893 

0.017 

Repl 

0:00 

1:07 

2:04 

0:48 

- 

- 

- 

Q 

Infer 

48:48 

1:08:22 

1:33:52 

16:17 

6.425 

0.803 

0.035 

O 

Uniq 

34:03 

1:04:45 

1:29:47 

18:51 

- 

- 

- 

Write 

1:53 

2:19 

3:29 

0:29 

3.340 

0.417 

0.199 

Total 

2:14:10 

2:20:08 

2:34:23 

6:43 

4.197 

0.525 

0.129 

Load 

1:08 

1:44 

2:39 

0:27 

12.535 

0.783 

0.018 

Repl 

0:00 

0:55 

1:31 

0:27 

- 

- 

- 

16 

Infer 

8:40 

36:11 

51:19 

11:35 

11.752 

0.735 

0.024 

Uniq 

21:17 

39:09 

1:03:43 

12:55 

- 

- 

- 

Write 

1:02 

1:12 

1:26 

0:07 

8.116 

0.507 

0.065 

Total 

1:16:04 

1:19:11 

1:25:48 

3:52 

7.552 

0.472 

0.075 

Load 

0:40 

0:59 

1:35 

0:15 

20.979 

0.656 

0.017 

Repl 

0:00 

0:36 

0:55 

0:15 

- 

- 

- 

32 

Infer 

6:06 

23:25 

50:08 

10:28 

12.030 

0.376 

0.054 

Uniq 

18:57 

46:43 

1:02:43 

10:42 

- 

- 

- 

Write 

0:37 

1:08 

1:48 

0:28 

6.463 

0.202 

0.127 

Total 

1:11:14 

1:12:51 

1:18:33 

2:15 

8.249 

0.258 

0.093 

Load 

0:26 

0:39 

1:02 

0:09 

32.145 

0.502 

0.016 

Repl 

0:00 

0:23 

0:36 

0:09 

- 

- 

- 

64 

Infer 

4:48 

15:05 

38:28 

7:40 

15.678 

0.245 

0.049 

Uniq 

15:31 

38:53 

49:07 

7:38 

- 

- 

- 

Write 

0:29 

0:52 

1:09 

0:13 

10.116 

0.158 

0.085 

Total 

55:28 

55:53 

56:08 

0:14 

11.543 

0.180 

0.072 
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Figure  5.3:  Strong  Scaling,  Mastiff,  Par-CoreRDFS,  BTC2012 


of  computation  can  be  attributed  to  poor  load-balancing  (which  is  evident  from  the 
standard  deviations  in  table  5.7),  contention  for  shared  resources,  and  redundant 
work. 


5.2. 1.2  Blue  Gene/Q 

The  Blue  Gene/Q  is  a  very  different  architecture  from  Mastiff,  as  described  in 
section  5.1.2.  Most  notably,  it  is  a  distributed  memory  architecture  in  which  each 
node  has  16  cores  and  16  GB  of  memory.  Strong  scaling  is  performed  for  512,  1,024, 
and  2,048  nodes.  For  inference  on  LUBM10K,  16  processors  are  assigned  to  a  node, 
and  so  8,192  is  the  base  number  of  processors  used  for  computing  metrics.  Due  to 
the  need  for  a  large  amount  of  memory  per  processor  for  inference  on  BTC2012,  only 
one  processor  is  assigned  per  node.  It  should  be  noted  that,  in  the  case  of  BTC2012, 
the  computational  power  of  the  Blue  Gene/Q  is  severely  underutilized.  Future  work 
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includes  adding  Pthreads  to  the  inference  engine  to  better  take  advantage  of  the 
computational  resources  of  each  individual  node. 

For  inference  on  LUBM10K,  times  are  reported  down  to  hundredths  of  a  second 
so  as  not  too  lose  precision  in  trends  due  to  very  low  times.  Due  to  a  minor  bug  in 
the  timing  code,  however,  standard  deviations  are  reported  in  seconds. 

The  metrics  for  strong  scaling  Par-CoreRDFS  inference  on  LUBM10K  using 
the  Blue  Gene/Q  are  given  in  table  5.8.  Note  that  the  time  for  8,192  processors  is 
used  as  the  base  for  computing  metrics.  The  execution  times  in  figure  5.4a  indicate 
that  the  majority  of  the  total  process  is  dominated  by  disk  I/O  and  the  uniq  phase. 
The  infer  phase  itself  takes  4.18,  2.04,  and  0.99  seconds  on  8,192,  16,384,  and  32,768 
processors,  respectively. 

Figure  5.4b  shows  that,  while  relative  speedup  for  the  total  process  is  terrible, 
the  infer  phase  achieves  superlinear  speedup.  This  is  also  reflected  in  efficiency  shown 
in  figure  5.4c.  Additionally,  the  Karp-Flatt  metric  in  figure  5.4d  indicates  that  the 
infer  phase  is  completely  parallel.  In  other  words,  it  appears  that  Par-CoreRDFS 
inference  on  LUBM  data  is  completely  scalable  (in  the  strong  scaling  sense)  on  the 
Blue  Gene/Q.  On  the  downside,  the  total  process  is  not  actually  getting  any  faster. 

Par-MemOWL2  inference  on  LUBM10K  shows  the  same  trends.  The  metrics 
are  given  in  table  5.9.  The  execution  times  in  figure  5.5a  show  that  at  8,192  pro¬ 
cessors,  the  infer  and  uniq  phases  dominate  the  total  process,  but  that  changes  as 
the  number  of  processors  increases,  causing  disk  I/O  to  dominate  the  total  process. 
In  fact,  time  to  write  to  disk  increases.  Superlinear  speedup  is  still  achieved  for 
the  infer  phase  as  shown  by  relative  speedup  in  figure  5.5b  and  efficiency  in  figure 
5.5c.  The  Karp-Flatt  metric  in  figure  5.5d  shows  that  the  infer  phase  is  completely 
parallel. 

While  these  results  on  LUBM10K  are  exciting  and  quite  impressive,  it  must  be 
remembered  that  LUBM  data  is  unrealistic  and  that  the  data  is  naturally  organized 
when  generated.  The  same  cannot  be  said  for  BTC2012.  The  metrics  for  Par- 
CoreRDFS  inference  on  BTC2012  are  given  in  table  5.10.  It  must  be  noted  that 
the  maximum  infer  time  far  exceeds  the  average  for  any  number  of  processors  listed. 
This  indicates  that  in  the  BTC2012  dataset  (thinking  of  it  as  a  sorted  N-triples 
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Table  5.8:  Strong  Scaling,  Blue  Gene/Q,  Par-CoreRDFS,  LUBM10K 


p 

Task 

Min 

Avg 

Max 

Dev 

Sdup8192 

Eff8192 

KF8192 

Load 

0:00.95 

0:05.40 

0:10.11 

0:01 

1.000 

1.000 

- 

Repl 

0:00.27 

0:04.47 

0:08.47 

0:01 

- 

- 

- 

8192 

Infer 

0:03.42 

0:03.51 

0:04.18 

0:00 

1.000 

1.000 

- 

Uniq 

0:10.76 

0:11.40 

0:11.48 

0:00 

- 

- 

- 

Write 

0:01.31 

0:03.82 

0:07.78 

0:02 

1.000 

1.000 

- 

Total 

0:26.92 

0:29.43 

0:33.39 

0:02 

1.000 

1.000 

- 

Load 

0:00.52 

0:06.85 

0:09.62 

0:01 

1.050 

0.525 

0.905 

Repl 

0:00.45 

0:02.15 

0:06.43 

0:01 

- 

- 

- 

16384 

Infer 

0:01.39 

0:01.71 

0:02.04 

0:00 

2.041 

1.020 

-0.020 

Uniq 

0:06.17 

0:06.50 

0:06.80 

0:00 

- 

- 

- 

Write 

0:01.47 

0:04.80 

0:05.49 

0:01 

1.416 

0.708 

0.412 

Total 

0:20.24 

0:23.56 

0:24.25 

0:01 

1.377 

0.688 

0.453 

Load 

0:00.26 

0:05.91 

0:07.49 

0:01 

1.350 

0.337 

0.655 

Repl 

0:00.79 

0:01.76 

0:06.95 

0:01 

- 

- 

- 

32768 

Infer 

0:00.43 

0:00.83 

0:00.99 

0:00 

4.207 

1.052 

-0.016 

Uniq 

0:03.97 

0:04.12 

0:04.50 

0:00 

- 

- 

- 

Write 

0:03.47 

0:11.54 

0:13.81 

0:02 

0.564 

0.141 

2.032 

Total 

0:17.41 

0:25.48 

0:27.76 

0:02 

1.203 

0.301 

0.775 

(b)  Relative  Speedup 


(d)  Karp-Flatt  Metric 


Figure  5.4:  Strong  Scaling,  Blue  Gene/Q,  Par-CoreRDFS,  LUBM10K 
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Table  5.9:  Strong  Scaling,  Blue  Gene/Q,  Par-MemOWL2,  LUBM10K 


p 

Task 

Min 

Avg 

Max 

Dev 

Sdup8192 

Eff8i92 

kf8192 

Load 

0:00.83 

0:04.52 

0:08.03 

0:06 

1.000 

1.000 

- 

Repl 

0:00.58 

0:03.77 

0:07.02 

0:06 

- 

- 

- 

8192 

Infer 

0:12.44 

0:12.70 

0:14.99 

0:00 

1.000 

1.000 

- 

Uniq 

0:12.25 

0:14.50 

0:14.76 

0:00 

- 

- 

- 

Write 

0:01.04 

0:03.37 

0:06.88 

0:02 

1.000 

1.000 

- 

Total 

0:37.44 

0:39.77 

0:43.28 

0:02 

1.000 

1.000 

- 

Load 

0:00.28 

0:06.87 

0:09.98 

0:01 

0.805 

0.402 

1.485 

Repl 

0:00.60 

0:02.51 

0:07.60 

0:00 

- 

- 

- 

16384 

Infer 

0:05.09 

0:06.19 

0:07.35 

0:01 

2.039 

1.020 

-0.019 

Uniq 

0:07.03 

0:08.17 

0:09.25 

0:01 

- 

- 

- 

Write 

0:01.78 

0:04.85 

0:11.60 

0:01 

0.593 

0.296 

2.374 

Total 

0:27.63 

0:30.72 

0:37.46 

0:01 

1.155 

0.578 

0.731 

Load 

0:00.16 

0:05.46 

0:07.41 

0:01 

1.084 

0.271 

0.897 

Repl 

0:00.91 

0:01.88 

0:07.60 

0:01 

- 

- 

- 

32768 

Infer 

0:01.55 

0:02.99 

0:03.57 

0:01 

4.189 

1.047 

-0.015 

Uniq 

0:04.70 

0:05.28 

0:06.70 

0:01 

- 

- 

- 

Write 

0:02.65 

0:10.71 

0:15.52 

0:02 

0.443 

0.111 

2.675 

Total 

0:20.49 

0:28.54 

0:33.33 

0:02 

1.298 

0.325 

0.694 

(b)  Relative  Speedup 


(d)  Karp-Flatt  Metric 


Figure  5.5:  Strong  Scaling,  Blue  Gene/Q,  Par-MemOWL2,  LUBM10K 
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file),  some  regions  result  in  far  more  inferences  than  others.  This  is  not  surprising 
since  my  colleagues  and  I  had  already  witnessed  such  an  effect  in  performing  partial 
RDFS  inference  on  the  2009  Billion  Triples  Challenge  dataset  (BTC2009)  [44], 

Of  greatest  interest,  though,  is  that  Par-CoreRDFS  inference  on  BTC2012 
with  512/1,024  processors  is  slower  (considering  the  maximum  processor  time)  than 
it  is  on  Mastiff  with  64  processors.  On  2,048  processors,  the  maximum  time  is 
about  the  same.  Note  also  that  the  average  processor  time  continues  to  decrease. 
The  most  likely  explanation  is  that  some  processor  is  assigned  a  very  concentrated, 
“inference-rich”  region  of  the  BTC2012  dataset,  and  that  processor  is  pushing  its 
memory  limits.  On  Mastiff,  this  was  not  an  issue  because  processors  requiring 
less  memory  allowed  other  processors  to  have  more  memory,  and  so  the  skew  in 
memory  requirement  is  tolerable  due  to  the  large  shared  memory.  Such  is  not 
the  case  on  the  Blue  Gene/Q.  Better  load-balancing  is  much  needed  to  improve 
scalability  on  distributed  memory  architectures.  An  obvious  possible  solution  is  to 
allow  communication  between  processors,  taking  advantage  of  the  Blue  Gene/Q’s 
high-performance  network.  This  is  outside  the  scope  of  the  study  of  this  thesis,  and 
so  it  remains  as  future  work. 

Regardless,  figure  5.6  indicates  poor  scaling  from  512  to  1,024  processors,  but 
good  scaling  from  1,024  to  2,048  processors.  Using  512  processors  as  the  base  for 
computing  metrics,  the  infer  phase  is  roughly  40%  serial  at  2,048  processors.  Again, 
this  is  largely  due  to  load-balancing  issues  which  are  not  easily  or  intuitively  solved 
as  discussed  in  section  5.2. 1.1.  That  is,  the  sorted  order  of  the  BTC2012  dataset  has 
hurt  here,  but  static  random  allocation  would  likely  exacerbate  the  memory  issues. 
More  investigation  is  needed,  but  I  conjecture  that  using  static  random  allocation 
would  give  the  impression  of  better  scalability  (since  relative  speedup  is  being  used) 
but  not  necessarily  faster  execution  time.  Urbani’s  [51]  method  of  data  distribution 
may  help,  although  it  is  not  completely  clear  since  load-balancing  was  still  an  issue 
for  him  on  real-world  datasets. 
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Table  5.10:  Strong  Scaling,  Blue  Gene/Q,  Par-CoreRDFS,  BTC2012 


p 

Task 

Min 

Avg 

Max 

Dev 

Sdup512 

Eff5i2 

kf512 

Load 

0:05 

0:10 

0:17 

0:01 

1.000 

1.000 

- 

Repl 

0:02 

0:09 

0:14 

0:01 

- 

- 

- 

512 

Infer 

8:28 

15:04 

1:04:10 

6:35 

1.000 

1.000 

- 

Uniq 

20:00 

1:09:09 

1:15:43 

6:33 

- 

- 

- 

Write 

0:03 

0:06 

0:08 

0:01 

1.000 

1.000 

- 

Total 

1:24:33 

1:24:38 

1:25:13 

0:03 

1.000 

1.000 

- 

Load 

0:02 

0:05 

0:09 

0:01 

1.889 

0.944 

0.059 

Repl 

0:01 

0:05 

0:08 

0:01 

- 

- 

- 

1024 

Infer 

8:11 

12:22 

1:00:28 

4:27 

1.061 

0.531 

0.885 

Uniq 

17:44 

1:05:51 

1:10:01 

4:26 

- 

- 

- 

Write 

0:02 

0:04 

0:08 

0:01 

1.000 

0.500 

1.000 

Total 

1:18:23 

1:18:26 

1:18:59 

0:02 

1.079 

0.539 

0.854 

Load 

0:00 

0:03 

0:04 

0:01 

4.250 

1.062 

-0.020 

Repl 

0:01 

0:03 

0:06 

0:01 

- 

- 

- 

2048 

Infer 

8:02 

10:53 

35:02 

1:44 

1.832 

0.458 

0.395 

Uniq 

10:25 

34:35 

37:25 

1:44 

- 

- 

- 

Write 

0:01 

0:01 

0:03 

0:00 

2.667 

0.667 

0.167 

Total 

45:34 

45:35 

45:54 

0:01 

1.857 

0.464 

0.385 

(b)  Relative  Speedup 


(d)  Karp-Flatt  Metric 


Figure  5.6:  Strong  Scaling,  Blue  Gene/Q,  Par-CoreRDFS,  BTC2012 
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5.2.2  Data  Scaling 

Now  I  turn  to  a  different  kind  of  scalability  called  data  scaling  which  has 
already  been  introduced  in  section  5.1.3.  In  short,  though,  data  scaling  is  concerned 
with  how  well  execution  time  can  be  held  constant  (or  rather,  kept  from  increasing) 
as  processors  are  added,  holding  the  ratio  of  input  size  to  number  of  processors  fixed. 
This  perspective  is  data-centric  in  that  it  is  concerned  with  handling  more  data,  not 
executing  faster.  Thus,  it  is  a  form  of  weak  scaling. 

The  metric  for  data  scaling  is  called  growth  efficiency  [7].  A  value  of  one 
indicates  that  scaling  number  of  processors  linearly  with  input  size  holds  execution 
time  constant.  A  value  less  than  one  indicates  that  adding  more  processors  cannot 
handle  (linearly)  larger  input  without  increasing  the  execution  time. 

The  data  scaling  evaluation  is  performed  by  first  having  the  maximum  number 
of  processors  for  the  given  scenario  perform  inference  on  the  entire  dataset.  Then 
half  of  the  processors  perform  inference  on  the  first  half  of  the  dataset.  Then  a 
quarter  of  the  processors  perform  inference  on  the  first  quarter  of  the  dataset,  and 
so  on.  Thus,  data  scaling  is  very  susceptible  to  distribution  skew  in  the  dataset  hies, 
and  this  should  be  kept  in  mind  when  interpreting  all  results  in  this  section. 

Recall  from  section  5.1.5  that  the  replication  phase  is  performed  as  a  prepro¬ 
cessing  step  for  data  scaling.  Thus,  the  times  for  repl  reported  in  the  tables  and 
figures  simply  indicate  how  long  processors  waited  at  a  barrier. 

5. 2. 2.1  Mastiff 

Data  scaling  on  Mastiff  is  not  entirely  sensible.  As  an  SMP  machine,  scaling 
up  processors  within  a  single  machine  does  not  provide  additional  (space-related) 
resources  for  handling  more  data.  Regardless,  a  data  scaling  evaluation  can  reveal 
characteristics  about  the  data  and  the  machine  that  can  provide  more  insights  into 
the  ruleset  and  dataset  under  consideration. 

The  metrics  for  Par-CoreRDFS  inference  on  LUBM10K  are  given  in  table  5.11. 
Note  that  even  though  the  task  is  effectively  the  same  for  64  processors  as  it  was  in 
strong  scaling,  the  times  are  generally  faster  in  the  data  scaling  results.  The  greatest 
contributing  factor  to  this  difference  appears  to  be  that  loading  the  data  was  faster. 


112 


Table  5.11:  Data  Scaling,  Mastiff,  Par-CoreRDFS,  LUBM10K 


p 

Task 

Min 

Avg 

Max 

Dev 

GEff 

Load 

0:21 

0:21 

0:21 

0:00 

1.000 

Repl 

0:00 

0:00 

0:00 

0:00 

- 

1 

Infer 

1:21 

1:21 

1:21 

0:00 

1.000 

Uniq 

0:00 

0:00 

0:00 

0:00 

- 

Write 

0:05 

0:05 

0:05 

0:00 

1.000 

Total 

1:47 

1:47 

1:47 

0:00 

1.000 

Load 

0:22 

0:22 

0:23 

0:00 

0.913 

Repl 

0:00 

0:00 

0:01 

0:00 

- 

o 

Infer 

1:21 

1:21 

1:21 

0:00 

1.000 

Z 

Uniq 

1:37 

1:37 

1:37 

0:00 

- 

Write 

0:05 

0:05 

0:05 

0:00 

1.000 

Total 

3:26 

3:26 

3:26 

0:00 

0.519 

Load 

0:22 

0:23 

0:23 

0:00 

0.913 

Repl 

0:00 

0:00 

0:01 

0:00 

- 

A 

Infer 

1:22 

1:23 

1:24 

0:01 

0.964 

4 

Uniq 

1:56 

1:57 

1:58 

0:01 

- 

Write 

0:06 

0:06 

0:06 

0:00 

0.833 

Total 

3:49 

3:49 

3:49 

0:00 

0.467 

Load 

0:21 

0:23 

0:23 

0:01 

0.913 

Repl 

0:00 

0:00 

0:02 

0:01 

- 

O 

Infer 

1:22 

1:23 

1:24 

0:01 

0.964 

O 

Uniq 

2:16 

2:17 

2:18 

0:01 

- 

Write 

0:06 

0:07 

0:07 

0:00 

0.714 

Total 

4:09 

4:10 

4:10 

0:00 

0.428 

Load 

0:23 

0:24 

0:25 

0:01 

0.840 

Repl 

0:00 

0:01 

0:02 

0:01 

- 

16 

Infer 

1:25 

1:27 

1:28 

0:01 

0.920 

Uniq 

2:29 

2:30 

2:32 

0:01 

- 

Write 

0:07 

0:21 

0:22 

0:04 

0.227 

Total 

4:29 

4:43 

4:44 

0:04 

0.377 

Load 

0:26 

0:46 

1:07 

0:19 

0.313 

Repl 

0:00 

0:21 

0:41 

0:19 

- 

32 

Infer 

1:38 

1:42 

1:44 

0:02 

0.779 

Uniq 

3:09 

3:11 

3:15 

0:02 

- 

Write 

0:09 

0:25 

0:28 

0:03 

0.179 

Total 

6:11 

6:27 

6:30 

0:03 

0.274 

Load 

0:31 

0:33 

0:45 

0:02 

0.467 

Repl 

0:00 

0:12 

0:14 

0:02 

- 

64 

Infer 

2:20 

2:45 

3:14 

0:14 

0.418 

Uniq 

4:19 

4:49 

5:13 

0:13 

- 

Write 

0:20 

0:34 

0:45 

0:04 

0.111 

Total 

8:42 

8:54 

9:06 

0:04 

0.196 
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Table  5.12:  Data  Scaling,  Mastiff,  Par-MemOWL2,  LUBM10K 


p 

Task 

Min 

Avg 

Max 

Dev 

GEff 

Load 

0:22 

0:22 

0:22 

0:00 

1.000 

Repl 

0:00 

0:00 

0:00 

0:00 

- 

1 

Infer 

5:05 

5:05 

5:05 

0:00 

1.000 

Uniq 

0:00 

0:00 

0:00 

0:00 

- 

Write 

0:05 

0:05 

0:05 

0:00 

1.000 

Total 

5:32 

5:32 

5:32 

0:00 

1.000 

Load 

0:21 

0:21 

0:22 

0:00 

1.000 

Repl 

0:00 

0:00 

0:01 

0:00 

- 

o 

Infer 

5:06 

5:07 

5:08 

0:01 

0.990 

z 

Uniq 

1:58 

1:59 

2:00 

0:01 

- 

Write 

0:06 

0:06 

0:06 

0:00 

0.833 

Total 

7:34 

7:34 

7:34 

0:00 

0.731 

Load 

0:22 

0:22 

0:23 

0:00 

0.957 

Repl 

0:00 

0:01 

0:01 

0:00 

- 

A 

Infer 

5:07 

5:07 

5:08 

0:00 

0.990 

4 

Uniq 

2:21 

2:21 

2:22 

0:00 

- 

Write 

0:07 

0:07 

0:07 

0:00 

0.714 

Total 

7:59 

7:59 

7:59 

0:00 

0.693 

Load 

0:21 

0:23 

0:24 

0:01 

0.917 

Repl 

0:00 

0:01 

0:03 

0:01 

- 

Q 

Infer 

5:04 

5:35 

6:20 

0:35 

0.803 

O 

Uniq 

2:43 

3:28 

3:59 

0:35 

- 

Write 

0:08 

0:08 

0:11 

0:01 

0.455 

Total 

9:35 

9:35 

9:38 

0:01 

0.574 

Load 

0:22 

0:23 

0:26 

0:01 

0.846 

Repl 

0:00 

0:03 

0:04 

0:01 

- 

16 

Infer 

5:21 

5:55 

7:13 

0:46 

0.704 

Uniq 

2:57 

4:15 

4:49 

0:45 

- 

Write 

0:09 

0:12 

0:14 

0:02 

0.357 

Total 

10:45 

10:48 

10:51 

0:02 

0.510 

Load 

0:27 

0:27 

0:29 

0:01 

0.759 

Repl 

0:00 

0:01 

0:02 

0:01 

- 

32 

Infer 

6:16 

7:13 

10:00 

1:20 

0.508 

Uniq 

3:47 

6:34 

7:31 

1:19 

- 

Write 

0:12 

0:24 

0:34 

0:09 

0.147 

Total 

14:28 

14:40 

14:50 

0:09 

0.373 

Load 

0:31 

0:32 

0:39 

0:02 

0.564 

Repl 

0:00 

0:07 

0:08 

0:02 

- 

64 

Infer 

7:49 

11:02 

14:55 

2:43 

0.341 

Uniq 

5:05 

8:59 

12:11 

2:42 

- 

Write 

0:21 

0:39 

0:47 

0:08 

0.106 

Total 

21:00 

21:19 

21:26 

0:07 

0.258 
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Table  5.13:  Data  Scaling,  Mastiff,  Par-CoreRDFS,  BTC2012 


p 

Task 

Min 

Avg 

Max 

Dev 

GEff 

Load 

0:17 

0:17 

0:17 

0:00 

1.000 

Repl 

0:00 

0:00 

0:00 

0:00 

- 

1 

Infer 

9:47 

9:47 

9:47 

0:00 

1.000 

Uniq 

0:00 

0:00 

0:00 

0:00 

- 

Write 

0:09 

0:09 

0:09 

0:00 

1.000 

Total 

10:13 

10:13 

10:13 

0:00 

1.000 

Load 

0:17 

0:17 

0:17 

0:00 

1.000 

Repl 

0:00 

0:00 

0:00 

0:00 

- 

o 

Infer 

6:03 

7:55 

9:47 

1:52 

1.000 

z 

Uniq 

2:51 

4:42 

6:33 

1:51 

- 

Write 

0:08 

0:08 

0:08 

0:00 

1.125 

Total 

13:01 

13:02 

13:03 

0:01 

0.783 

Load 

0:17 

0:17 

0:18 

0:00 

0.944 

Repl 

0:00 

0:00 

0:01 

0:00 

- 

A 

Infer 

6:04 

7:38 

9:49 

1:22 

0.997 

4 

Uniq 

3:29 

5:39 

7:13 

1:21 

- 

Write 

0:08 

0:08 

0:09 

0:00 

1.000 

Total 

13:43 

13:43 

13:45 

0:01 

0.743 

Load 

0:16 

0:17 

0:18 

0:01 

0.944 

Repl 

0:00 

0:01 

0:02 

0:01 

- 

Q 

Infer 

6:05 

7:24 

9:59 

1:03 

0.980 

O 

Uniq 

3:58 

6:32 

7:51 

1:02 

- 

Write 

0:09 

0:10 

0:10 

0:00 

0.900 

Total 

14:23 

14:24 

14:25 

0:00 

0.709 

Load 

0:15 

0:19 

0:24 

0:03 

0.708 

Repl 

0:00 

0:05 

0:09 

0:03 

- 

16 

Infer 

4:42 

12:41 

20:11 

6:28 

0.485 

Uniq 

7:14 

14:46 

22:43 

6:27 

- 

Write 

0:12 

0:14 

0:16 

0:01 

0.562 

Total 

28:02 

28:05 

28:08 

0:02 

0.363 

Load 

1:19 

1:29 

1:44 

0:07 

0.163 

Repl 

0:00 

0:15 

0:25 

0:07 

- 

32 

Infer 

3:50 

12:23 

32:54 

8:16 

0.297 

Uniq 

11:18 

31:48 

40:19 

8:14 

- 

Write 

0:15 

0:29 

0:40 

0:09 

0.225 

Total 

46:13 

46:23 

46:36 

0:09 

0.219 

Load 

0:20 

0:26 

0:33 

0:03 

0.515 

Repl 

0:00 

0:07 

0:13 

0:03 

- 

64 

Infer 

4:53 

15:19 

39:25 

7:59 

0.248 

Uniq 

15:29 

39:33 

49:56 

7:57 

- 

Write 

0:29 

0:45 

0:57 

0:07 

0.158 

Total 

55:55 

56:10 

56:20 

0:07 

0.181 
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(a)  Times  (seconds)  (b)  Growth  Efficiency 

Figure  5.7:  Data  Scaling,  Mastiff,  Par-CoreRDFS,  LUBM10K 


(a)  Times  (seconds)  (b)  Growth  Efficiency 

Figure  5.8:  Data  Scaling,  Mastiff,  Par-MemOWL2,  LUBM10K 


(a)  Times  (seconds)  (b)  Growth  Efficiency 


Figure  5.9:  Data  Scaling,  Mastiff,  Par-CoreRDFS,  BTC2012 
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This  could  be  explained  by  the  fact  that  each  processor’s  local  input  dataset  is  sorted 
as  a  result  of  having  performed  the  replication  phase  as  a  preprocessing  step,  and 
thus  loading  the  triples  into  a  std:  :set  is  faster.  Of  course,  the  repl  phase,  which 
is  really  just  a  barrier,  is  also  faster. 

Trends  can  be  observed  in  figure  5.7.  Note  that  the  infer  phase  appears  to  keep 
its  time  fairly  constant  until  16-32  processors.  At  64  processors,  growth  efficiency 
for  the  infer  phase  is  a  mere  0.418.  This  is  an  expected  result,  and  it  confirms 
two  things.  First,  the  LUBM10K  dataset  has  a  fairly  even  distribution  to  it  wrt 
Par-CoreRDFS  inference.  If  it  did  not,  then  growth  efficiency  would  change  more 
drastically  (in  either  direction)  on  lower  numbers  of  processors.  Second,  since  the 
LUBM10K  dataset  is  fairly  evenly  distributed,  the  contention  for  resources  is  the 
only  explanation  for  the  decrease  in  growth  efficiency. 

For  Par-MemOWL2  inference  on  LUBM10K,  the  metrics  are  given  in  table 
5.12  and  visualized  in  figure  5.8.  The  infer  phase  takes  nearly  the  same  amount 
of  time  up  to  four  processors,  and  after  that,  the  time  begins  to  increase.  This 
makes  sense  because,  at  eight  processors,  sharing  of  L2  cache  begins.  However,  the 
significant  increase  in  time  was  not  observed  for  Par-CoreRDFS  inference  until  after 
16  processors.  This  suggests  that  with  the  smaller  number  of  inferences  for  Par- 
CoreRDFS  inference,  processors  can  take  greater  advantage  of  their  (independent) 
LI  caches  and  more  efficiently  share  their  L2  caches.  With  Par-MemOWL2,  there 
are  more  inferences,  and  so  cache  contention  increases.  This  is  clearly  observed  by 
the  steady  decline  of  growth  efficiency  from  four  to  64  processors. 

For  Par-CoreRDFS  inference  on  BTC2012,  the  metrics  are  given  in  table  5.13 
and  visualized  in  figure  5.9.  The  infer  phase  is  nearly  constant  up  to  eight  processors, 
and  then  the  time  increases  drastically  after  that.  Although  cache  contention  is  a 
factor,  this  behavior  seems  abnormal  compared  to  inference  on  LLIBM10K.  Figure 
5.9b  shows  that  from  eight  to  16  processors,  there  is  a  steep  drop  in  growth  efficiency, 
much  steeper  than  was  witnessed  with  LLIBM10K.  An  explanation  could  be  skew 
in  the  data.  Data  scaling  on  the  Blue  Gene/Q  will  help  to  determine  the  root  cause 
since  scaling  number  of  nodes  does  not  increase  cache  contention. 
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5. 2. 2. 2  Blue  Gene/Q 

Unlike  with  Mastiff,  data  scaling  is  more  sensible  on  a  distributed  memory 
architecture  in  which  increasing  the  number  of  independent 28  nodes  can  handle 
larger  datasets  without  concern  for  contention  of  shared  resources. 

Times  for  inference  on  LUBM10K  are  reported  down  to  hundredths  of  a  second 
in  order  to  preserve  precision  in  the  trends.  Due  to  a  bug  in  the  timing  code, 
standard  deviation  of  times  is  not  reported  for  LUBM10K. 

The  metrics  for  data  scaling  Par-CoreRDFS  inference  on  LUBM10K  are  given 
in  table  5.14.  As  is  expected,  figure  5.10  indicates  nearly  perfect  data  scaling  for  the 
infer  phase.  For  Par-MemOWL2  (metrics  in  table  5.15),  figure  5.11  also  indicates 
good  data  scaling,  but  not  close  perfect.  This  is  difficult  to  explain  because  data 
scaling  is  perfect  from  8,192  to  16,384  processors,  so  this  would  suggest  that,  given 
the  even  distribution  of  LUBM  data  and  the  fact  that  inference  is  embarrassingly 
parallel,  it  should  also  hold  for  higher  numbers  of  processors.  However,  figure  5.11b 
contradicts  this  intuition  with  a  drop  in  growth  efficiency  from  16,384  to  32,768 
processors.  Further  investigation  is  needed  and  is  left  as  future  work. 

Turning  to  BTC2012,  the  metrics  for  data  scaling  Par-CoreRDFS  inference 
are  given  in  table  5.16.  Figure  5.12a  shows  that  time  for  the  infer  phase  remains 
essentially  constant  from  512  to  1,024  processors  and  then  suddenly  increases  from 
1,024  to  2,048  processors.  Growth  efficiency  for  the  infer  phase  in  figure  5.12b  is  poor 
for  2,048  processors  at  a  mere  0.442.  This  is  the  result  of  skew  in  the  distribution 
of  data  in  BTC2012.  Specifically,  it  appears  the  the  second  half  of  the  dataset 
contains  more  work  than  the  first  half.  Here  the  sorted  order  of  the  dataset  has  hurt 
again.  Recall,  though,  that  were  I  to  use  static  random  allocation,  memory  would 
be  exhausted,  and  inference  would  be  infeasible.  Future  work  includes  further  study 
on  load-balancing  including  experiments  using  Urbani’s  approach  [51].  Considering 
the  high-performance  network  of  the  Blue  Gene/Q,  the  speed-dating  approach  of 
Kotoulas  et  al.  is  also  a  good  candidate  for  consideration  [47]. 

28Independent  with  respect  to  embarrassingly  parallel  computation. 
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Table  5.14:  Data  Scaling,  Blue  Gene/Q,  Par-CoreRDFS,  LUBM10K 


p 

Task 

Min 

Avg 

Max 

GEfF8i92 

Load 

00:00.25 

00:03.51 

00:09.06 

1.000 

Repl 

00:00.18 

00:05.37 

00:09.02 

- 

8192 

Infer 

00:00.85 

00:00.88 

00:00.96 

1.000 

Uniq 

00:03.23 

00:03.28 

00:03.35 

- 

Write 

00:00.57 

00:01.83 

00:02.45 

1.000 

Total 

00:14.36 

00:15.62 

00:16.24 

1.000 

Load 

00:00.24 

00:03.58 

00:05.94 

1.523 

Repl 

00:00.15 

00:02.02 

00:05.80 

- 

16384 

Infer 

00:00.85 

00:00.89 

00:00.98 

0.980 

Uniq 

00:03.69 

00:03.76 

00:03.83 

- 

Write 

00:01.52 

00:04.94 

00:05.99 

0.409 

Total 

00:12.73 

00:16.16 

00:17.21 

0.944 

Load 

00:00.26 

00:05.91 

00:07.49 

1.210 

Repl 

00:00.79 

00:01.76 

00:06.95 

- 

32768 

Infer 

00:00.43 

00:00.83 

00:00.99 

0.970 

Uniq 

00:03.97 

00:04.12 

00:04.50 

- 

Write 

00:03.47 

00:11.54 

00:13.81 

0.177 

Total 

00:17.41 

00:25.48 

00:27.76 

0.585 

Table  5.15:  Data  Scaling,  Blue  Gene/Q,  Par-MemOWL2,  LUBM10K 


P 

Task 

Min 

Avg 

Max 

GEffgi92 

Load 

00:00.21 

00:03.46 

00:05.43 

1.000 

Repl 

00:00.13 

00:01.63 

00:05.39 

- 

8192 

Infer 

00:02.96 

00:03.05 

00:03.19 

1.000 

Uniq 

00:03.80 

00:03.92 

00:04.03 

- 

Write 

00:00.56 

00:01.87 

00:02.52 

1.000 

Total 

00:13.67 

00:14.98 

00:15.63 

1.000 

Load 

00:00.25 

00:03.86 

00:09.48 

0.573 

Repl 

00:00.17 

00:04.92 

00:09.28 

- 

16384 

Infer 

00:02.96 

00:03.05 

00:03.18 

1.005 

Uniq 

00:04.38 

00:04.48 

00:04.59 

- 

Write 

00:01.54 

00:04.93 

00:05.52 

0.458 

Total 

00:19.45 

00:22.85 

00:23.44 

0.667 

Load 

00:00.16 

00:05.46 

00:07.41 

0.733 

Repl 

00:00.91 

00:01.88 

00:07.60 

- 

32768 

Infer 

00:01.55 

00:02.99 

00:03.57 

0.893 

Uniq 

00:04.70 

00:05.28 

00:06.70 

- 

Write 

00:02.65 

00:10.71 

00:15.52 

0.163 

Total 

00:20.49 

00:28.54 

00:33.33 

0.469 
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(a)  Times  (seconds)  (b)  Growth  Efficiency 

Figure  5.10:  Data  Scaling,  Blue  Gene/Q,  Par-CoreRDFS,  LUBM10K 


(a)  Times  (seconds)  (b)  Growth  Efficiency 

Figure  5.11:  Data  Scaling,  Blue  Gene/Q,  Par-MemOWL2,  LUBM10K 


(a)  Times  (seconds) 


(b)  Growth  Efficiency 


Figure  5.12:  Data  Scaling,  Blue  Gene/Q,  Par-CoreRDFS,  BTC2012 
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Table  5.16:  Data  Scaling,  Blue  Gene/Q,  Par-CoreRDFS,  BTC2012 


p 

Task 

Min 

Avg 

Max 

Dev 

GEff5i2 

Load 

0:03 

0:05 

0:07 

0:01 

1.000 

Repl 

0:00 

0:02 

0:04 

0:01 

- 

512 

Infer 

8:11 

11:24 

15:30 

1:23 

1.000 

Uniq 

4:10 

8:17 

11:29 

1:22 

- 

Write 

0:01 

0:01 

0:02 

0:00 

1.000 

Total 

19:47 

19:48 

19:51 

0:01 

1.000 

Load 

0:03 

0:05 

0:07 

0:01 

1.000 

Repl 

0:00 

0:03 

0:05 

0:01 

- 

1024 

Infer 

8:09 

11:05 

15:30 

1:15 

1.000 

Uniq 

4:12 

8:37 

11:33 

1:14 

- 

Write 

0:00 

0:01 

0:04 

0:00 

0.500 

Total 

19:50 

19:51 

19:54 

0:01 

0.997 

Load 

0:00 

0:03 

0:04 

0:01 

1.750 

Repl 

0:01 

0:03 

0:06 

0:01 

- 

2048 

Infer 

8:02 

10:53 

35:02 

1:44 

0.442 

Uniq 

10:25 

34:35 

37:25 

1:44 

- 

Write 

0:01 

0:01 

0:03 

0:00 

0.667 

Total 

45:34 

45:35 

45:54 

0:01 

0.432 

5.3  Summary 

This  evaluation  set  out  to  test  the  scalability  that  can  be  achieved  with  re¬ 
stricted  rulesets  such  that  correct  (embarrassingly)  parallel  inference  is  possible. 
Using  LUBM10K,  a  very  high  degree  of  scalability  was  achieved  for  Par-CoreRDFS 
and  Par-MemOWL2  inference  in  both  a  strong  scaling  and  data  scaling  scenario. 
The  Karp-Flatt  metric  revealed  that  inference  on  LUBM10K  is  almost  completely 
parallel  in  strong  scaling,  and  growth  efficiency  revealed  that  good  data  scaling  is 
also  achievable. 

The  importance  of  the  results  with  LUBM10K  is  that  they  concretely  demon¬ 
strate  that  the  theoretical  results  of  highly  parallel  and  scalable  inference  from  pre¬ 
vious  chapters  are  achievable  in  practice  and  not  merely  some  unachievable  math¬ 
ematical  ideal.  However,  LUBM10K  is  ideal.  It  does  not  contain  many  of  the 
practical  difficulties  that  are  encountered  in  real-world  datasets. 

Therefore,  parallel  inference  was  also  performed  on  the  BTC2012  dataset.  Par- 
MemOWL2  inference  on  BTC2012  resulted  in  memory  exhaustion,  and  so  it  appears 
that  in  some  cases,  a  huge  amount  of  memory  is  necessary  for  parallel  inference  to 
even  be  feasible.  Par-CoreRDFS  inference  on  BTC2012  was  feasible  and  demon- 
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streated  excellent  strong  scaling  np  to  64  processors  on  Mastiff,  a  SMP  system. 
However,  poor  scalability  was  observed  on  the  Blue  Gene/Q.  This  is  attributed  to 
skew  in  the  dataset  that  causes  some  processors  to  need  far  more  memory  than 
others.  On  Mastiff,  this  was  not  an  issue  because  processors  needing  more  memory 
could  use  memory  not  used  by  processors  needing  less  memory.  In  the  distributed 
environment  of  the  Blue  Gene/Q,  such  was  not  possible,  and  the  processors  needing 
a  large  amount  of  memory  were  likely  pushing  their  capacities  and  hurting  perfor¬ 
mance. 

These  problems  could  theoretically  be  solved  by  effective  load-balancing.  Us¬ 
ing  a  simple  solution  like  static  random  allocation,  though,  actually  hurts  overall 
performance  and  increases  memory  consumption  in  contrast  to  the  natural  subject 
grouping  of  the  datasets  as  generated,  as  discussed  in  section  5.2. 1.1.  Load-balancing 
is  left  as  future  work.  Candidate  approaches  include  those  used  by  LTrbani  [51]  and 
Kotoulas  et  al.  [47]. 

In  short,  parallel  inference  on  Semantic  Web  data  is  memory-intensive  and 
relies  heavily  on  load-balancing  for  scalability.  If  these  issues  are  solvable,  then  it 
appears  from  the  evaluation  on  LLIBM10K  that  a  high  degree  of  scalability  can  be 
achieved. 

Additionally,  using  a  Blue  Gene/Q,  inference  for  interesting  Semantic  Web 
rulesets  has  been  demonstrated  on  1.33  billion  LUBM  triples  around  30  seconds  and 
on  the  BTC2012  dataset  around  45  minutes.  The  fastest  time  for  Par-CoreRDFS 
inference  on  LLIBM10K  reported  herein  is  28  seconds,  a  few  seconds  faster  than  the 
current  record  (to  the  best  of  my  knowledge)  of  31  seconds  achieved  on  a  Cray  XMT 
[82]  by  Goodman  and  Mizell  [33].  To  the  best  of  my  knowledge,  this  work  is  also 
the  first  to  have  performed  any  kind  of  well-defined,  complete  closure29  on  a  BTC 
dataset,  although  Williams  et  al.  [44]  performed  a  kind  of  partial  RDFS  closure  on 
the  BTC2009  dataset. 


29To  be  clear,  this  is  the  complete  Par-CoreRDFS  closure  wrt  the  operational  semantics  from 
chapter  3  and  not  the  model-theoretic  semantics  of  RDFS  from  [17].  The  former  is  a  specific  subset 
of  the  latter. 


CHAPTER  6 
CONCLUSION 

Inference  on  the  Semantic  Web  continues  to  be  a  problem  clue  in  part  to  the  large 
volume  of  data  involved.  This  thesis  has  addressed  the  problem  from  a  perspective 
of  data  parallelism,  attempting  to  scale  production  rule  inference  to  larger  datasets 
by  adding  more  processors.  However,  the  achievable  degree  of  parallelism  is  not 
merely  a  function  of  clever  implementation.  The  rules  and  data  also  directly  impact 
the  degree  of  parallelism  that  can  be  achieved  as  demonstrated  in  chapter  5. 

Previous  work  focused  on  determining  restrictions  on  the  data  such  that  cor¬ 
rectness  of  parallel  inference  could  be  ensured,  but  in  reality,  on  the  Web,  no  such 
restrictions  can  be  reasonably  enforced.  Thus,  it  makes  more  sense  to  consider  con¬ 
ditions  and  restrictions  on  the  rules.  Another  way  to  look  at  this  is  that,  instead  of 
maximizing  expressivity  and  suffering  whatever  performance  inhibitions  come  with 
it,  consider  fixing  the  requisite  performance  characteristics  and  instead  suffer  a  loss 
of  expressivity. 

In  chapter  3,  the  definitions  and  operational  semantics  for  the  production 
rules  under  consideration  were  given,  and  definitions  for  parallel  inference  were  in¬ 
troduced.  Following  that,  sufficient  conditions  were  determined  for  ground  rules 
such  that,  when  the  conditions  are  met  for  every  rule  instance  of  a  rule  in  a  ruleset, 
parallel  inference  is  correct  with  respect  to  a  distribution  scheme.  These  findings 
are  restricted  to  a  certain  class  of  rulesets  referred  to  as  polarized  rulesets  in  which 
each  rule  has  either  only  assert  actions  or  only  retract  actions.  They  are  also  re¬ 
stricted  to  a  certain  class  of  CRSs  called  RAOCs  in  which  all  rule  instances  are  fired 
in  every  cycle  such  that  retractions  precede  assertions.  These  conditions  were  then 
generalized  to  rules  (not  just  ground  rules)  and  patterns  (instead  of  individual  facts) 
so  that  rules  could  be  directly  tested. 

In  chapter  4,  a  specific  form  of  distribution  schemes  was  considered  called 
replication  schemes.  Using  replication  schemes  is  significantly  simpler  and  has  nice 
properties  such  as  triviality  of  implementing  parallel  inference  and  ease  of  specik- 


122 


123 


cation.  The  sufficient  conditions  for  rules  were  then  reformulated  to  be  specific  to 
replication  schemes,  which  led  to  the  observation  that  testing  these  new  conditions 
is  reducible  to  satisfiability,  and  not  just  SAT,  but  specifically  2SAT.  The  2SAT 
reduction  is  useful  for  testing  conditions  and  for  deriving  replication  schemes  that 
support  correct  parallel  inference  for  a  (polarized)  ruleset  (with  a  RAOC),  but  many 
interesting  rulesets  will  have  only  a  single  solution:  to  replicate  all  facts  to  all  pro¬ 
cessors.  Therefore,  the  2SAT  reduction  was  augmented  into  a  3SAT  reduction  that 
allows  for  the  possibility  to  eliminate  rules  in  order  to  improve  parallelization.  The 
downside,  though,  is  that  even  for  moderately  sized  rulesets,  the  search  space  for 
solutions  to  the  3SAT  formulas  can  be  quite  large.  Therefore,  a  methodology  was 
also  given  to  aide  in  deriving  restricted  rulesets  amenable  to  parallel  inference.  This 
methodology  was  then  used  to  derive  restricted  versions  of  the  RDFS  and  OWL2RL 
rulesets. 

In  chapter  5,  an  evaluation  was  performed  to  demonstrate  the  scalability  that 
is  achievable  using  restricted  versions  of  the  RDFS  and  OWL2RL  rulesets.  Two 
large  datasets  were  used,  referred  to  as  LUBM10K  and  BTC2012.  The  LUBM10K 
is  an  unrealistic,  synthetic  dataset  of  over  1.3  billion  triples,  and  BTC2012  is  a 
real-world  dataset  crawled  from  the  Web  containing  over  one  billion  triples.  The 
evaluation  demonstrated  that,  when  there  is  sufficient  memory  and  sufficient  load- 
balancing  is  achieved,  a  very  high  degree  of  parallelism  can  be  achieved.  This  was 
made  obvious  by  use  of  the  Karp-Flatt  metric  (the  experimentally  determined  serial 
fraction  of  computation)  and  an  SMP  machine.  The  real  difficulty,  though,  arises 
in  a  distributed  memory  environment,  where  load-balancing  and  large  memory  per 
node  becomes  essential.  Scalability  of  inference  was  demonstrated  on  LUBM10K 
up  to  32,768  processors  on  2,048  nodes  of  a  Blue  Gene/Q,  achieving  inference  times 
(not  including  other  phases)  of  no  more  than  a  few  seconds. 

To  pithily  (and  roughly)  summarize  the  overall  contribution  of  this  thesis,  I 
have  provided  proof  and  methodology  for  determining  parts  of  production  rule  infer¬ 
ence  that  are  embarrassingly  parallel,  and  I  have  demonstrated  that  when  practical 
issues  of  load-balancing  and  memory  availability  can  be  solved,  a  high  degree  of 
availability  can  be  achieved.  Future  work  should  focus  on  addressing  the  practical 
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issues  that  prevent  achieving  a  high  degree  of  scalability,  and  addressing  portions  of 
inference  that  are  not  embarrassingly  parallel. 

6.1  Future  Work 

Although  this  thesis  constitutes  a  significant  contribution  toward  webscale 
inference,  it  is  really  only  a  first  step  in  the  right  direction.  Conditions  have  been 
determined  under  which,  as  has  been  demonstrated,  highly  scalable  inference  is 
feasible.  However,  these  conditions  are  quite  restrictive,  e.g.  disallowing  functional 
and  inverse  functional  properties  from  OWL2RL.  The  reason  is  that  throughout  this 
work,  one  characteristic  has  been  held  fixed:  embarrassingly  parallel  computation. 
That  is,  processors  have  not  been  allowed  to  communicate  with  each  other  during 
inference. 

The  natural,  next  step  is  then  to  introduce  communication  in  order  to  handle 
rules  that  had  to  be  eliminated  in  order  to  preserve  correct,  (embarrassingly)  parallel 
inference.  In  this  case,  the  3SAT  reduction  can  simply  be  reinterpreted.  Specifically, 
instead  of  interpreting  y(r)  to  mean  the  elimination  of  rule  r,  let  it  instead  mean 
that  (some  of)  the  inferences  of  r  must  be  (dynamically)  replicated. 

The  evaluation  in  chapter  5,  while  sufficient  for  the  purposes  of  this  thesis,  left 
much  to  be  desired.  Consideration  of  a  “middle  ground”  dataset  would  likely  be 
helpful  to  those  who  have  real-world  data  that  is  not  quite  so  difficult  as  BTC2012. 
As  mentioned  several  times,  investigation  into  effective  load-balancing  is  needed 
for  distributed  memory  architectures.  Additionally,  for  inference  over  BTC2012, 
the  computational  power  of  the  Blue  Gene/Q  was  severely  underutilized.  A  more 
parallel-aware  inference  engine  utilizing  Pthreads  or  OpenMP  would  resolve  this 
issue. 

Some  supercomputers  are  not  amenable  to  embarrassing  parallelism  and  are 
optimized  for  cases  that  necessitate  interaction  of  processors.  For  example,  the  Cray 
XMT  [82]  has  large  shared  memory  without  a  typical  cache  hierarchy,  and  so  dividing 
up  the  problem  to  be  local  to  individual  processors  does  not  result  in  significant 
performance  gains  relative  to  more  typical  architectures.  However,  the  Cray  XMT 
(hardware)  processor  is  designed  to  avoid  penalties  from  accessing  memory,  and  so 
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its  strength  relative  to  more  typical  architectures  is  in  problems  that  are  the  opposite 
of  embarrassing  parallelism.  Thus,  parallelizing  inference  for  such  architectures  is  a 
very  different  problem  and  requires  an  entirely  different  perspective  and  approach 
than  presented  in  this  thesis. 

Finally,  as  mentioned,  I  was  not  able  to  perform  Par-MemOWL2  inference 
on  the  BTC2012  dataset  due  to  an  explosion  of  inferences  that  quickly  exhausted 
memory.  Hogan  et  al.  surmised  that  this  can  be  clue  (at  least  in  part)  to  abuse 
of  ontologies,  termed  “ontology  hijacking”  [19,  48].  They  propose  restrictions  on 
inference  in  order  to  avoid  such  abuses,  thus  reducing  the  number  of  inferences. 
Such  approaches  are  likely  necessary,  and  more  are  needed.  Therefore,  it  remains  as 
future  work  to  determine  root  causes  for  the  explosion  of  inferences,  and  to  figure 
out  appropriate  ways  to  cope  with  them. 
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APPENDIX  A 

DICTIONARY  ENCODING 

As  with  all  the  code,  the  dictionary  encoding  code  was  written  in  C++,  and  MPI 
was  used  for  interprocess  communication.  All  (non-collective)  communication  and 
disk  I/O  was  performed  using  asynchronous  MPI  calls.  The  dictionary  encoding 
process  begins  by  every  processor  collectively  opening  the  (single)  index  hie  for  the 
LZO-compressed  data  and  using  the  offsets  to  divide  the  number  of  compressed 
blocks  fairly  evenly  among  processors.  After  this,  the  processors  collectively  open 
the  (single)  LZO-compressed  N-triplcs  hie  and  begin  reading  from  the  beginning  of 
their  segments.  Using  the  large  block  partition  of  the  GPFS  [83]  hie  system  at  the 
CCNI,  page  sizes  are  4  MB.  All  reads  occur  along  page  boundaries,  reading  one 
page  at  a  time  (except  possibly  in  the  cases  of  the  beginning  of  the  hie  segment  and 
the  end  of  the  hie  segment,  which  may  be  partial  pages).  MPI :  :File :  :  Iread.at  is 
used  for  reading  so  that,  which  a  processor  was  processing  on  page,  the  next  page 
is  being  read.  As  pages  are  read,  blocks  in  those  pages  are  LZO-decompressed,  and 
RDF  triples  are  parsed.  Each  term  in  the  RDF  triple  is  independently  dictionary 
encoded. 

Skipping  for  a  moment  the  actual  dictionary  encoding  process,  when  a  triple 
is  finished  being  dictionary-encoded  -  that  is,  each  term  is  replaced  with  a  uniquely 
identifying  8-byte  integer  -  the  encoded  triple  is  written  into  a  four  MB  buffer. 
When  the  four  MB  buffer  is  full,  it  is  written  to  an  output  hie.  Each  proces¬ 
sor  opens  its  own  output  hie  in  its  own  directory  (to  prevent  contention  over 
directory  metadata)  in  the  MPI :  :  C0MMJ3ELF  communicator,  and  pages  are  writ¬ 
ten  using  the  split-collective  MPI  routines  MPI:  :File:  : Write_at_all_begin  and 
MPI:  :File:  : Write_at_all_end.  Using  split-collective  routines  here  is  somewhat 
of  an  odd  choice  because  the  communicator  in  which  the  calls  are  being  made  is 
MPI:  :C0MM_SELF  which  contains  only  the  calling  processor.  Thus,  there  is  really  no 
need  for  a  collective  routine.  These  routines  were  used  because  they  are  part  of  a 
generic  object  for  writing  hies  in  MPI  that  could  be  used  with  multiple  processors 


134 


135 


as  well. 

Regarding  the  dictionary  encoding  process,  there  are  two  distinct  parts  of  the 
computation:  the  actual  dictionary  encoding,  and  the  interprocess  communication. 
The  interprocess  communication  has  three  layers.  From  lowest  level  to  highest  level 
they  are  the  packet  distributor,  the  string  distributor,  and  the  controller. 

A  distributor  (packet  or  string)  has  four  main  methods:  send,  noMoreSends, 
receive,  and  done.  The  send  method  takes  a  processor  rank  and  a  string  of  bytes 
and  returns  true  if  the  bytes  are  being  sent  (not  already  sent )  to  the  processor 
identified  by  the  given  rank.  Otherwise,  false  is  returned.  Calling  noMoreSends 
tells  the  distributor  that  send  and  noMoreSends  will  not  be  called  again,  and  doing 
so  would  cause  an  exception  to  be  thrown.  Calling  receive  (no  parameters)  returns 
a  string  of  bytes  or  NULL.  NULL  indicates  that  nothing  could  be  immediately  received, 
but  not  that  there  is  nothing  incoming.  Finally,  there  is  the  done  method  which 
takes  no  parameters  and  returns  a  boolean  value.  The  done  method  is  a  bit  tricky 
because  it  is  -  by  the  interface  definition  of  distributor  -  collective.  That  is,  when 
one  processor  calls  done,  it  should  be  expected  that  the  processor  will  wait  for  all 
other  processors  to  also  call  done  before  returning.  Thus  it  is  a  potential  source 
of  deadlock.  A  processor  should  periodically  call  done  even  if  it  knows  it  is  not 
done,  just  to  ensure  that  other  processors  are  not  trapped  in  a  call  to  done.  If  done 
returns  true,  then  all  processors  have  finished  distribution,  and  no  more  calls  should 
be  made  to  distributor  methods,  done  cannot  return  true  until  every  processor  has 
called  noMoreSends  and  all  the  outstanding  messages  have  been  received. 

The  packet  distributor  uses  timewarp-like  communication  with  parameters 
similar  to  those  in  [84],  These  parameters  are  referred  to  herein  as  the  packet 
size ,  the  number  of  requests,  and  the  coordination  period.  The  packet  size  is  the 
fixed  size  of  any  given  message  to  be  sent.  If  send  is  called  with  a  string  of  bytes 
of  a  different  length,  then  an  exception  is  thrown.  In  previous  work  [74],  1  had 
used  variable-length  messages,  requiring  a  receiving  process  to  first  receive  a  mes¬ 
sage  containing  the  size  of  the  subsequent  message,  allocate  enough  space  for  the 
subsequent  message,  and  then  receive  the  subsequent  message.  This  proved  to  sig¬ 
nificantly  inhibit  parallelism  when  there  is  frequent  interprocessor  communication, 
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and  using  fixed-size  packets  is  intended  to  remedy  that  problem.  The  number  of 
requests  is  the  maximum  number  of  asynchronous  send  requests  and  the  maximum 
number  of  asynchronous  receive  requests  that  a  processor  can  have.  Let  the  number 
of  requests  be  N.  Upon  initialization,  the  packet  distributor  immediately  makes  N 
asynchronous  receive  requests  using  MPI :  :Comm:  :Irecv.  When  receive  is  called, 
MPI :  :  Request :  :Testany  is  used  to  check  for  a  completed  receive  request.  If  one  is 
found,  the  packet  (as  a  string  of  bytes)  is  returned,  and  another  MPI :  :Comm:  :  Irecv 
is  started.  If  no  receive  request  has  completed,  then  NULL  is  returned.  Similarly, 
when  send  is  called,  MPI :  :  Request :  :Testany  is  used  to  check  for  inactive  or  com¬ 
pleted  send  requests.  If  one  is  found,  then  a  new  send  request  is  started  with 
MPI:  :Comm:  :Isend,  and  true  is  returned.  Otherwise,  false  is  returned. 

The  tricky  part  to  the  packet  distributor  is  the  handling  of  the  call  to  done, 
and  it  is  in  this  regard  that  the  packet  distributor  is  like  distributed  timewarp 
computations.  The  packet  distributor  keeps  two  counts:  the  number  of  calls  to 
done,  call  it  the  “done  count”  ;  and  the  number  of  net  messages,  call  it  the  “message 
count.”  The  done  count  is  initialized  to  zero,  and  the  message  count  is  initialized 
to  one.  Every  time  a  send  request  is  started,  the  message  count  is  incremented, 
and  every  time  a  receive  request  is  completed,  the  message  count  is  decremented. 
Every  time  done  is  called,  the  done  count  is  incremented.  If  the  done  count  is  less 
than  the  coordination  period,  then  done  immediately  returns  false.  Otherwise,  the 
done  count  is  reset  to  zero,  and  the  processor  calls  MPI:  :Coram:  :Allreduce  to  sum 
over  the  processors’  message  count.  If  the  sum  of  the  message  counts  is  non-zero, 
false  is  returned.  Otherwise,  all  receive  requests  are  cancelled  (there  are  always  N 
outstanding)  and  true  is  returned.  The  important  thing  to  understand  here  is  how 
the  sum  of  the  processors’  message  counts  equaling  zero  is  an  indication  of  having 
finished.  When  a  processor  calls  noMoreSends,  the  message  count  is  decremented 
to  effectively  undo  the  initialization  to  one.  Therefore,  letting  p  be  the  number  of 
processors  and  s  the  sum  of  the  message  counts,  upon  initialization,  s  =  p.  Prior  to 
any  processor  calling  noMoreSends,  s  —  m  +p  where  m  is  the  number  of  messages 
that  have  been  sent  but  not  received  (from  the  perspective  of  the  packet  distributor). 
Notice  that  even  if  all  sent  messages  have  been  received  m  —  0,  s  >  0.  This  is 


137 


appropriate  because  a  processor  might  still  have  more  packets  to  send  but  just  has 
yet  to  tell  the  packet  distributor.  However,  when  the  processor  calls  noMoreSends, 
this  tells  the  packet  distributor  that  send  will  not  be  called  again.  Letting  q  be 
the  number  of  processors  that  have  called  noMoreSends,  then  s  —  m  +  p  —  q.  As 
long  as  q  <  p,  it  is  impossible  for  s  =  0.  Finally,  when  all  processors  have  called 
noMoreSends,  p  =  q  and  s  =  m.  When  m  =  0,  then  truly,  there  are  no  more 
outstanding  messages,  and  no  processors  will  send  any  more  messages.  Therefore, 
distribution  is  finished. 

The  parameters  to  the  packet  distributor  have  significant  impact  on  perfor¬ 
mance.  From  experience,  for  dictionary  encoding,  a  packet  size  of  128  bytes  seemed 
to  work  well  (this  may  make  more  sense  after  the  discussion  of  the  string  distributor). 
Consulting  professor  and  distributed  timewarp  computation  expert  Chris  Carothers 
concerning  his  experience  with  similar  parameters  when  using  a  Blue  Gene,  he  stated 
that  the  using  eight  and  4,096  for  the  timewarp  parameters  analogous  to  number 
of  requests  and  coordination  period  had  worked  well  for  him.  Starting  with  these 
values,  neighboring  values  were  also  tested,  none  of  which  appeared  to  improve 
performance  (and  in  some  cases,  worsened  performance).  Therefore,  when  using 
a  packet  distributor  (which  will  be  discussed  again  later),  number  of  requests  was 
always  set  to  eight,  and  the  coordination  period  was  always  set  to  4,096.  Clearly, 
these  parameters  are  architecture  dependent.  One  must  be  particularly  careful  with 
number  of  requests,  coordination  period,  and  the  manner  in  which  the  done  method 
is  periodically  called.  Allowing  too  many  send  requests  and  a  long  coordination  pe¬ 
riod  can  result  in  underlying  MPI  message  buffers  to  quickly  eat  up  memory.  Even 
if  the  parameters  are  reasonably  set,  though,  if  the  done  method  is  not  called  with 
regularity  on  all  processors,  one  processor  can  fall  behind  on  receives  or  race  ahead 
on  sends,  and  the  result  will  be  the  same. 

The  string  distributor  operates  on  top  of  a  packet  distributor  and  is  responsi¬ 
ble  for  relaying  variable  length  messages.  It  breaks  down  messages  into  individual 
packets.  Letting  the  packet  size  be  n,  if  send  is  called  with  a  message  of  length 
less  than  or  equal  to  n  —  sizeof  (size_t) ,  then  the  length  of  the  message  is  writ¬ 
ten  into  the  first  sizeof  (size_t)  bytes  of  a  packet,  and  the  message  is  written  in 
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the  remainder  of  the  packet.  The  string  distributor  then  calls  send  on  the  packet 
distributor  with  that  packet.  However,  if  the  message  to  be  sent  is  larger  than 
n  —  sizeof  (sizert),  then  the  message  is  broken  down  into  multiple  packets.  The 
packets  will  have  a  header  of  sizeof  (size_t)  +  sizeof  (int)  +  8  bytes  where  the 
first  sizeof  (size_t)  bytes  stores  the  overall  message  length,  the  next  sizeof  (int) 
bytes  identifies  the  sending  processor,  the  following  four  bytes  contains  a  message 
identifier,  and  the  last  four  bytes  contains  the  relative  placement  of  the  packet  among 
the  other  packets  for  the  same  message.  The  string  distributor  then  calls  send  on 
the  underlying  packet  distributor  to  send  all  the  packets.  Recall,  though,  that  the 
packet  distributor  can  refuse  to  send  a  packet  by  returning  false.  In  this  case,  the 
string  distributor  buffers  the  packets.  Then  true  is  returned.  Later,  when  send  or 
done  is  called,  the  string  distributor  attempts  to  send  the  packets  again.  In  the  case 
of  send,  if  any  of  the  buffered  packets  could  not  be  sent,  then  false  is  returned. 
That  is,  the  string  distributor  will  only  buffer  one  message  at  a  time  and  will  refuse 
other  messages  until  it  is  able  to  send  the  buffered  packets.  If  all  the  buffered  pack¬ 
ets  are  successfully  sent  (to  the  packet  distributor),  then  the  string  distributor  will 
attempt  to  send  the  new  message  as  well. 

Receiving  messages  in  the  string  distributor  is  a  bit  like  putting  together  a 
puzzle.  A  std:  : multimap  is  used  to  collect  incoming  packets.  The  keys  of  the 
std:  : multimap  are  the  (treated  as  a  unit)  message  length,  sender  rank,  and  message 
identifier.  When  receive  is  called,  the  string  distributor  calls  receive  on  the 
underlying  packet  distributor  for  another  packet.  If  no  packet  is  available,  the 
string  distributor  returns  NULL.  If  a  packet  is  received,  the  string  distributor  check 
to  see  if  it  is  the  last  packet  for  a  message,  and  if  so,  composes  the  message  (it  could 
be  just  the  single  packet)  and  returns  it.  Otherwise,  it  places  the  packet  into  the 
std:  : multimap  and  returns  false. 

Now  when  noMoreSends  is  called,  the  string  distributor  notes  that  is  has  been 
called,  but  it  will  not  call  noMoreSends  on  the  underlying  packet  distributor  if  it 
has  buffered  packets.  If  it  does  have  buffered  packets,  it  will  attempt  to  send  them, 
and  if  they  can  all  be  sent,  it  will  call  noMoreSends  on  the  packet  distributor.  When 
done  is  called,  if  there  are  buffered  packets,  the  string  distributor  attempts  to  send 
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them.  If  it  is  successful,  then  it  will  call  noMoreSends  on  the  packet  distributor. 
Regardless,  done  is  called  on  the  packet  distributor  and  its  value  returned. 

Having  described  the  packet  distributor  which  handles  fixed-length  messages, 
and  having  described  the  string  distributor  which  handles  variable-length  messages, 
the  final  component  to  be  discussed  is  the  controller.  The  controller  coordinates  be¬ 
tween  a  distributed  computation  and  a  distributor,  and  so  a  distributed  computation 
must  first  be  described. 

A  distributed  computation  has  two  main  methods:  pickup  and  dropoff. 
When  pickup  is  called,  it  is  a  request  by  the  controller  that  the  distributed  compu¬ 
tation  provide  a  message  to  be  sent.  If  the  distributed  computation  has  a  message 
to  send,  it  writes  it  into  the  provided  buffer  (which  may  be  resized  as  necessary) 
and  returns  the  rank  of  the  processor  to  which  the  message  should  be  sent.  Alterna¬ 
tively,  it  may  return  -1  to  indicate  there  is  currently  nothing  to  send,  or  a  value  less 
than  -1  to  indicate  that  there  is  no  more  need  to  send  messages.  A  call  to  dropoff 
simply  provides  a  received  message  to  the  distributed  computation. 

The  function  of  the  controller  is  given  in  algorithm  4  which  consists  primarily 
of  two  loops.  In  the  first  loop,  a  message  is  “picked  up”  (if  sendto  >  0)  and  the 
inner  “receiving”  loop  is  entered.  Every  iteration  of  the  receiving  loop  attempts  to 
receive  a  message,  and  if  a  message  is  received,  it  is  “dropped  off”  to  the  distribution 
computation.  The  repeating  condition  of  this  condition  loop  is  important.  Note  that 
||  is  short- circuited,  meaning  that  the  subconditions  are  evaluation  from  left  to  right, 
and  once  a  subcondition  evaluates  to  true,  the  entire  condition  is  considered  true 
and  no  subconditions  to  the  right  are  evaluated.  The  call  too  dist.done  will  always 
return  false  on  line  eight,  but  as  mentioned,  it  is  important  to  call  it  periodically 
to  prevent  deadlock,  which  is  why  it  is  placed  in  the  loop  condition.  Then,  if 
sendto  <  0,  then  that  means  the  distributed  computation  did  not  have  a  message  to 
send,  and  so  this  will  short-circuit  the  evaluation  of  the  loop  condition  and  prevent 
dist.send  from  being  called.  However,  if  sendto  >  0,  then  dist.send  will  be  called, 
and  only  when  the  send  is  successful  will  the  receiving  loop  terminate.  The  first  loop 
(lines  1-9)  will  terminate  when  the  distributed  computation  indicates  that  there  are 
no  more  messages  to  be  sent,  and  dist. noMoreSends  is  called  on  line  10.  Lines 
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11-16  are  another  receiving  loop  which  terminates  when  dist.done  returns  true, 
indicating  that  there  are  no  more  messages  being  sent  and  no  more  messages  to  be 
received. 


Algorithm  4:  Control  between  Distributed  Computation  and  Distributor 
Input:  Distributor  dist  and  distributed  computation  comp. 

/*  buffer  and  msg  are  resizable  strings  of  bytes 
/*  sendto  is  an  integer 

1  repeat 

2  sendto  =  comp,  pickup  (buffer) 

3  repeat 

4  msg  =  dist.receiveQ 

5  if  msg  7^  NULL  then 

6  |  comp.dropoff(msg) 

7  end 

8  until  dist.done()  ||  sendto  <  0  ||  dist.  send  (sendto,  buffer) 

9  until  sendto  <  -1 
10  dist.noMoreSendsQ 

n  repeat 

12  msg  =  dist.receiveQ 

13  if  msg  NULL  then 

14  |  comp,  dr  op  off  (msg) 

is  end 


16  until  dist.done() 


*/ 

*/ 


Finally,  the  actual  dictionary  encoding  can  be  described  as  a  distributed  com¬ 
putation.  Recall  that  triples  are  continually  being  read  and  encoded  triples  contin¬ 
ually  being  buffered  and  written  to  disk.  In  between,  they  are  being  encoded. 
Algorithm  5  gives  a  high-level  overview  of  the  pickup  method.  On  line  1,  the 
cached_response  method  checks  to  see  if  there  are  any  lookup  responses  cached 
that  need  to  be  sent  (since  the  pickup  method  is  the  mechanism  by  which  messages 
are  sent),  and  if  so,  writes  the  lookup  response  in  the  buffer,  writes  the  request¬ 
ing  processor’s  rank  in  sendto,  and  returns  true.  Otherwise,  cached_response 
returns  false.  On  line  4,  the  next  .lookup  method  checks  to  see  whether  an¬ 
other  lookup  request  needs  to  be  made,  and  if  so,  sets  term  to  the  term  for  which 
lookup  is  needed  and  returns  true.  Otherwise,  next  .lookup  returns  false.  On  line 
5,  check.termination  handles  termination  of  the  distributed  dictionary  encoding 
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computation,  an  important  method  which  will  be  described  in  greater  detail  later. 
On  line  7,  the  lookup_in_progress  method  checks  to  see  if  there  is  already  an 
outstanding  lookup  request  and,  if  so,  caches  term  to  await  for  completion  of  the 
request  and  returns  true.  Otherwise,  lookup_in_progress  returns  false.  On  line 
10,  local.lookup  checks  to  see  whether  the  encoded  value  for  term  already  exists 
in  the  local  dictionary,  and  if  so,  encodes  the  term  and  returns  true.  Otherwise, 
local_lookup  returns  false.  On  line  13,  remote.lookup  determines  which  pro¬ 
cessor  is  responsible  for  encoding  term,  writes  a  lookup  request  in  the  buffer,  and 
return  the  rank  of  the  processor  to  which  the  request  should  be  sent. 

The  dropoff  method  for  distributed  dictionary  encoding  is  given  in  algorithm 
6.  On  line  1,  message_is_no_more_requests  returns  true  iff  the  buffer  contains  a 
message  stating  that  the  sending  processor  will  make  no  more  requests.  On  line 
4,  message_is_no_responses_expected  returns  true  iff  the  buffer  contains  a  mes¬ 
sage  stating  that  the  sending  processor  expects  no  more  responses.  On  line  7, 
message_is_request  returns  true  iff  buffer  contains  a  lookup  request. 


Algorithm  5:  Pickup  Method  for  Distributed  Dictionary  Encoding 
Input:  A  buffer  buffer  in  which  to  write  the  message. 

Output:  The  rank  of  the  processor  to  which  the  message  should  be  sent, 
or  -1  to  indicate  no  message,  or  less  than  -1  to  indicate  no  more 
messages. 

1  if  cached-responsefbuffer,  sendto)  then 

2  |  return  sendto 

3  end 

4  if  /next .lookup (term)  then 

5  |  return  check-termination(buffer) 

6  end 

7  if  lookup  in  progress  (term)  then 

8  |  return  -1 

9  end 

10  if  local-lookup  (term)  then 

11  |  return  -1 

12  end 

13  return  remote-lookup  (buffer,  term) 


Algorithms  5  and  6  have  been  given  to  provide  clarity  concerning  the  overall 
flow  of  the  distributed  computation.  Greater  detail  is  not  given  in  the  form  of 
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Algorithm  6:  Dropoff  Method  for  Distributed  Dictionary  Encoding 
Input:  A  buffer  containing  a  message. 

1  if  messageJs-no-more-requests(buffer)  then 

2  |  increment  the  “no  more  requests”  counter 

3  end 

4  if  messageJs-no_responses_expected(buffer)  then 

5  |  increment  the  “no  responses  expected”  counter 

6  end 

7  if  message  is  request  (buffer)  then 

8  |  perform  the  lookup  request  and  cache  the  response  for  pickup 

9  else 

/*  buffer  contains  a  lookup  response  */ 

10  encode  terms  awaiting  the  response 

11  write  out  any  completely  encoded  triples 

12  end 


pseudocode  since,  due  to  complexity,  it  would  not  be  significantly  clearer  than  the 
actual  source  code.  Instead,  a  written  description  of  the  important  points  follows. 

It  is  easiest  to  understand  from  the  perspective  of  the  lifecycle  of  a  triple, 
from  being  read,  to  getting  encoded,  to  being  written.  A  triple  is  read  in  the  call  to 
next.lookup  (after  requests  have  been  made  for  the  previous  triple  and  assuming 
there  are  any  triples  left  to  be  read),  and  immediately  a  “pending  triple”  is  created. 
A  pending  triple  is  an  array  of  three  8-byte  integers  along  with  a  count  called  the 
“need”  of  the  pending  triple.  The  need  is  initialized  to  three  because  none  of  the 
terms  in  the  triple  have  been  encoded  yet. 

Now  consider  the  lifecycle  of  a  single  term,  from  being  retrieved  from  a  triple, 
to  getting  encoded,  to  being  written  into  a  pending  triple.  This  term  is  the  term 
parameter  (by  reference)  of  next_lookup.  Skipping  over  lookup_in_progress  for  a 
moment,  on  line  10  of  algorithm  5,  the  processor  checks  its  local  dictionary  to  see 
if  it  already  has  an  ID  set  for  the  term.  If  so,  the  processor  writes  that  ID  into 
the  current  pending  triple  at  the  correct  position  and  decrements  the  need  of  the 
pending  triple.  If  the  need  is  now  zero,  then  the  pending  triple  is  fully  encoded  and 
written  to  output. 

However,  if  the  term  does  not  have  an  ID  associated  with  it  in  the  local 
dictionary,  then  a  remote  lookup  must  be  performed  on  line  13  of  algorithm  5.  Each 
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processor  keeps  a  count  of  the  number  of  lookup  requests  it  has  sent  out.  When  a 
remote  lookup  is  performed,  the  current  value  of  the  count  is  used  as  the  request  ID, 
and  the  count  is  incremented.  A  “pending  position1'  is  then  created,  which  consists 
of  a  pointer  to  the  current  pending  triple  and  an  integer  for  the  position  of  the  term 
being  looked  up  (zero,  one,  or  two).  The  request  ID  is  then  associated  with  the 
pending  position  in  a  std:  : multimap.  A  hash  function  is  used  to  determine  which 
processor  is  responsible  for  encoding  the  term. 

Eventually,  that  processor  receives  the  lookup  request  in  a  call  to  dropoff. 
Skipping  over  lines  1-6  of  algorithm  6  for  a  moment,  on  line  7,  the  processor  deter¬ 
mines  that  the  message  is  a  lookup  request,  and  on  line  8,  it  encodes  the  term  and 
caches  the  response  to  be  sent  in  a  later  call  to  pickup.  Eventually,  a  call  to  pickup 
for  that  processor  will  select  that  cached  response  on  line  1  of  algorithm  5  and  send 
it  back  to  the  requesting  processor. 

The  requesting  processor  will  receive  the  response  in  a  call  to  dropoff  and  will 
go  down  to  line  10  of  algorithm  6.  The  response  includes  with  it  the  request  ID  which 
is  used  to  lookup  all  the  pending  positions  associated  with  it  in  the  std:  : multimap. 
The  pending  positions  each  specify  a  position  in  a  specific  pending  triple  to  which 
the  encoded  term  ID  should  be  written.  The  ID  is  written  to  that  position  in  the 
pending  triple,  and  the  need  for  that  pending  triple  is  decremented.  If  the  need  is 
zero,  then  the  pending  triple  has  been  completely  encoded  and  written  to  output. 
The  term  is  then  associated  with  the  ID  in  the  processor’s  local  dictionary. 

Returning  to  lookup_in_progress  on  line  7  of  algorithm  5,  every  time  a  re¬ 
quest  is  sent,  the  term  is  associated  with  the  request  ID  in  a  std:  :map.  That  way, 
when  lookup_in_progress  is  called,  it  checks  that  std:  :map  to  see  if  a  request  is  in 
progress  for  the  current  term.  If  so,  a  new  pending  position  is  created  and  associated 
with  the  request  ID,  thus  avoiding  the  need  for  an  additional  lookup. 

One  final  detail  remains  to  be  explained,  and  that  is  check_termination. 
Since  distributed  dictionary  encoding  is  not  just  a  matter  of  redistributing  data 
(since  every  request  is  followed  by  a  response),  determining  when  encoding  is  finished 
is  a  delicate  matter.  next_lookup  returns  false  only  when  no  more  triples  can  be 
read.  The  Erst  p  (where  p  is  the  number  of  processors)  calls  to  check.termination 
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write  a  special  message  in  the  buffer  that  indicates  that  the  processor  will  make  no 
more  requests.  One  such  message  is  sent  to  each  processor.  Each  processor  counts 
the  number  of  such  messages  it  receives  on  line  2  of  algorithm  6.  Subsequent  calls  to 
check_terraination  return  -1  until  the  processor  has  no  more  pending  positions  (i.e., 
all  of  its  requests  have  been  answered).  Then,  the  next  p  calls  to  check.termination 
write  a  special  message  into  the  buffer  indicating  that  the  processor  expects  no  more 
responses.  One  of  such  messages  is  sent  to  each  processor,  and  each  processor  counts 
such  messages  on  line  5  of  algorithm  6.  Subsequent  calls  to  check.termination 
return  -1  until  the  sum  of  the  “no  more  requests”  counter  and  the  “no  responses 
expected”  counter  is  equal  to  2 p,  in  which  case  -2  is  returned  guaranteeing  that 
there  are  no  more  messages. 

The  complexity  here  is  because  even  if  a  processor  will  make  no  more  lookup 
requests,  it  might  still  service  lookup  requests  and  hence  need  to  send  responses. 
Thus  any  one  processor  is  not  necessarily  done  sending  messages  unless  all  the 
processors  are  done  sending  messages,  which  is  why  this  explicit  termination  test  is 
required. 


APPENDIX  B 
RULESETS 

This  appendix  contains  original  and  restricted,  RDFS  and  OWL2RL  rnlesets.  In  the 
restricted  versions,  special  sets  of  terms  are  used  to  more  concisely  represent  restric¬ 
tions.  Using  the  terminology  of  Hogan  et  al.  [19],  these  sets  are  the  set  of  “meta¬ 
classes”  MC  and  the  set  of  “metaproperties”  MP.  MPT  =  MP  U{rdf  :type}.  To 
improve  the  brevity  of  patterns  and  restricted  rules,  the  G  and  ^  symbols  will  be 
used  with  MP,  MPT,  and  MC  to  compress  multiple  restrictions,  patterns,  or  rules 
into  (syntactically)  single  restrictions,  patterns,  or  rules  (respectively). 

B.l  RDFS-based  Rulesets 

This  section  provides  some  of  the  specific  details  regarding  restriction  of  the 
RDFS  ruleset  into  the  Par-RDFS  ruleset.  Table  B.l  contains  the  RDFS  rulcset  prior 
to  restriction.  Note  that  it  does  not  contain  the  infinite  number  of  axiomatic  triples 
of  the  form  rdf  :  J  [rdf  :  type->rdf  s :  Container-Member shipProperty]  where  i  is 
any  positive  integer.  Also,  literal  generalization  (rules  lg  and  gl  from  [17])  has  been 
excluded  since,  in  the  context  of  RIF  inference,  there  is  no  need  to  create  RDF  blank 
nodes  representing  RDF  literals. 

In  section  4.3.1,  the  restriction  of  the  RDFS  ruleset  into  the  Par-RDFS  rulc¬ 
set  is  described.  The  forced  variable  assignments  for  patterns  are  given  in  table 
B.4.  Table  B.5  gives  the  rules  that  were  not  eliminated,  referred  to  herein  as  the 
Par-RDFS  ruleset.  Comparing  table  B.5  with  table  B.l  reveals  which  rules  were 
eliminated.  Rules  marked  with  an  asterisk  (*)  in  table  B.5  are  rules  that  resulted 
from  a  split  during  step  2.  Table  B.4  gives  patterns  such  that,  when  facts  matching 
the  patterns  are  replicated,  parallel  Par-RDFS  inference  is  correct. 

Table  B.l:  The  RDFS  Ruleset 


Rule  ID 

If  And(.  .  .) 

Then  Do  (Assert  (...)  ) 

rdfl 

?u[?a->?y] 

?a [rdf : type->rdf : Property] 

rdf2 

?u[?a->?l] 

?1 [rdf : type->rdf : XMLLiteral] 
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External (pred : is-literal-XMLLiteral (?1) ) 

rdfsl 

?u[?a->?l] 

External (pred : is-literal-PlainLiteral (?1) ) 

?1 [rdf : type->rdf  s : Literal] 

rdfs2 

?p  [rdf  s : domain->?c] 

?x[?p->?y] 

?x [rdf : type->?c] 

rdfs3 

?p  [rdf s :range->?c] 

?x[?p->?y] 

?y [rdf : type->?c] 

rdfs4a 

?u [?a->?x] 

?u [rdf : type->rdf  s : Resource] 

rdfs4b 

?u [?a->?v] 

?v [rdf : type->rdf  s : Resource] 

rdfs5 

?pl [rdf s : subProperty0f->?p2] 

?p2 [rdf s : subProperty0f->?p3] 

?pl  [rdf s : subProperty0f->?p3] 

rdfs6 

?u [rdf : type->rdf : Property] 

?u [rdf s : subPropertyOf->?u] 

rdfs7 

?pl [rdf s : subProperty0f->?p2] 

?x  [?pl->?y] 

?x[?p2->?y] 

rdfs8 

?u [rdf : type->rdf s : Class] 

?u[rdf s : subClassOf->rdf s : Resource] 

rdfs9 

?cl  [rdf s : subClass0f->?c2] 

?x  [rdf : type->?cl] 

?x [rdf : type->?c2] 

rdfslO 

?u [rdf : type->rdf s : Class] 

?u [rdf s : subClassOf->?u] 

rdfsl  1 

?cl [rdf s : subClass0f->?c2] 

?c2 [rdf s : subClass0f->?c3] 

?cl  [rdf s : subClass0f->?c3] 

rdfsl  2 

?u [rdf : type-> 

rdf  s : ContainerMembershipProperty] 

?u [rdf  s : subProperty Of ->rdf s : member] 

rdfsl  3 

?u [rdf :type->rdf s : Datatype] 

?u[rdf s : subClassOf->rdf s : Literal] 

rdfsaxl 

rdf : type [rdf : type->rdf : Property] 

rdfsax2 

rdf : subject [rdf : type->rdf : Property] 

rdfsax3 

rdf : predicate [rdf :type->rdf : Property] 

rdfsax4 

rdf : ob j  ect [rdf : type->rdf : Property] 

rdfsax5 

rdf : f irst [rdf : type->rdf : Property] 

rdfsax6 

rdf : rest [rdf : type->rdf : Property] 

rdfsax7 

rdf : value [rdf : type->rdf : Property] 

rdfsax8 

rdf : nil [rdf : type->rdf : List] 

rdfsax9 

rdf : type [rdf  s : domain->rdf : Property] 

rdfsaxlO 

rdf s : domain [rdf s : domain->rdf : Property] 

rdfsaxl  1 

rdf s : range [rdf s : domain->rdf : Property] 

rdfsaxl  2 

rdf s : subPropertyOf [ 

rdf s : domain->rdf : Property] 

rdfsaxl  3 

rdf s : subClassOf [rdf s : domain->rdf s : Class] 

rdfsaxl  4 

rdf : subject [rdf  s : domain->rdf : Statement] 

rdfsaxl  5 

rdf : predicate [rdf s : domain->rdf : Statement] 

rdfsaxl6 

rdf : obj  ect [rdf s : domain->rdf : Statement] 

rdfsaxl  7 

rdf s : member [rdf s : domain->rdfs : Resource] 

rdfsaxl  8 

rdf : f irst [rdf  s : domain->rdf : List] 

rdfsaxl9 

rdf : rest [rdf  s : domain->rdf : List] 

rdfsax20 

rdf s : seeAlso [rdf s : domain->rdfs : Resource] 

rdfsax21 

rdf s : isDef inedBy [ 

rdf s : domain->rdfs : Resource] 
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rdfsax22 

rdf s : comment [rdf s : domain->rdfs : Resource] 

rdfsax23 

rdf s : label [rdf s :domain->rdfs : Resource] 

rdfsax24 

rdf : value [rdf  s : domain->rdf  s : Resource] 

rdfsax25 

rdf : type [rdf  s : range->rdf  s : Class] 

rdfsax26 

rdf s : domain [rdf s : range->rdfs : Class] 

rdfsax27 

rdf s : range [rdf s : range->rdf s : Class] 

rdfsax28 

rdf s : subPropertyOf [ 

rdf s : range->rdf : Property] 

rdfsax29 

rdf s : subClassOf [rdf s : range->rdf s : Class] 

rdfsax30 

rdf : subject [rdf s : range->rdf s : Resource] 

rdfsax31 

rdf : predicate [rdf s : range->rdfs : Resource] 

rdfsax32 

rdf : object [rdf s : range->rdfs : Resource] 

rdfsax33 

rdf s : member [rdf s : range->rdfs : Resource] 

rdfsax34 

rdf : f irst [rdf  s : range->rdf  s : Resource] 

rdfsax35 

rdf : rest [rdf  s : range->rdf : List] 

rdfsax36 

rdf s : seeAlso [rdf s :range->rdf s : Resource] 

rdfsax37 

rdf  s : i sDef inedBy [rdf  s : r ange->rdf  s : Resource] 

rdfsax38 

rdf s : comment [rdf s : range->rdfs : Literal] 

rdfsax39 

rdf s : label [rdf s : range->rdf s : Resource] 

rdfsax40 

rdf  rvalue [rdf s :range->rdfs : Resource] 

rdfsax41 

rdf : Alt [rdf s : subClassOf->rdf s : Container] 

rdfsax42 

rdf :Bag [rdf s : subClassOf->rdf s : Container] 

rdfsax43 

rdf : Seq[rdf s : subClassOf->rdf s : Container] 

rdfsax44 

rdf s : ContainerMembershipProperty [ 

rdf s : subClassOf->rdf : Property] 

rdfsax45 

rdf s : isDef inedby [ 

rdf s : subPropertyOf ->rdf s : seeAlso] 

rdfsax46 

rdf  rXMLLiteral [rdf : type->rdfs : Datatype] 

rdfsax47 

rdf : XMLLiteral [ 

rdf s : subClassOf->rdf s : Literal] 

rdfsax48 

rdf s : Datatype [rdf s : subClassOf->rdf s : Class] 

Table  B.2:  The  RDFS  Metaclasses  and  Metaproperties 


MC 

MP 

rdf s : Class 

rdf s : Datatype 

rdf s : ContainerMembershipProperty 

rdf s : domain 

rdf s : range 

rdf s : subClassOf 

rdf s : subPropertyOf 

Table  B.3:  Forced  Assignments  from  Steps  1  and  3  of  the  Methodology 
applied  to  the  RDFS  ruleset 


Replicate  (a) 


Arbitrary  (e) 
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And(?xl[?x2->?x3]  ?x2  G  MP) 

And(?xl[?x2->?x3]  ?x2  £  MPT) 

And(?xl  [rdf  :  type->?x3]  ?x3  G  MC) 

And(?xl  [rdf  :  type->?x3]  ?x3  (£  MC) 

And(External (pred : is-literal-XMLLiteral (?xl) ) ) 

And (External (pred: is-literal-PlainLiteral (?xl) ) ) 

Table  B.4:  Replication  Patterns  for  Correct,  Parallel  Par-RDFS  Infer¬ 
ence 


Replicate  (a) 

And(?xl[?x2->?x3]  ?x2  G  MP) 

And(?xl  [rdf  :  type->?x3]  ?x3  G  MC) 

And (External (pred : is-literal-XMLLiteral (?xl ) ) ) 
And (External (pred: is-literal-PlainLiteral (?xl) ) ) 


Table  B.5:  The  Par-RDFS  Ruleset 


Rule  ID 

If  And(.  .  .) 

Then  Do  (Assert  (...)  ) 

rdfl 

?u [?a->?y] 

?a [rdf : type->rdf : Property] 

rdf2 

?u[?a->?l] 

External (pred : is-literal-XMLLiteral (?1) ) 

?1 [rdf : type->rdf : XMLLiteral] 

rdfsl 

?u[?a->?l] 

External (pred : is-literal-PlainLiteral (?1) ) 

?1 [rdf : type->rdf  s : Literal] 

rdfs2* 

?p  [rdf s :domain->?c] 

?x [?p->?y] 

?c  g  MC 

?x [rdf : type->?c] 

rdfs3* 

?p  [rdf s :range->?c] 

?x [?p->?y] 

?c  £  MC 

?y [rdf : type->?c] 

rdfs4a 

?u [?a->?x] 

?u  [rdf : type->rdf  s : Resource] 

rdfs4b 

?u[?a->?v] 

?v [rdf : type->rdf  s : Resource] 

rdfs5 

?pl [rdf s : subProperty0f->?p2] 

?p2 [rdf s : subProperty0f->?p3] 

?pl  [rdf s : subProperty0f->?p3] 

rdfs7a* 

?pl [rdf s : subProperty0f->?p2] 

?x [?pl->?y] 

?p2  g  MPT 

?x [?p2->?y] 

rdfs7b* 

?pl [rdf s : subProperty0f->?p2] 

?x  [?pl->?y] 

?y  £  MC 

?x [?p2->?y] 

rdfs8 

?u [rdf : type->rdf s : Class] 

?u[rdf s : subClassOf->rdf s : Resource] 

rdfs9* 

?cl  [rdf s : subClass0f->?c2] 

?x  [rdf : type->?cl] 

?c2  (£  MC 

?x [rdf : type->?c2] 

rdfslO 

?u [rdf : type->rdf s : Class] 

?u [rdf s : subClassOf->?u] 

rdfsl  1 

?cl  [rdf s : subClass0f->?c2] 

?cl  [rdf s : subClass0f->?c3] 
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?c2 [rdf s : subClass0f->?c3] 

rdfsl2 

?u [rdf : type-> 

rdf  s : ContainerMembershipProperty] 

?u [rdf  s : subProperty Of ->rdf s : member] 

rdfsl3 

?u [rdf :type->rdf s : Datatype] 

?u[rdf s : subClassOf->rdf s : Literal] 

rdfsaxl 

rdf : type [rdf : type->rdf : Property] 

rdfsax2 

rdf : subject [rdf : type->rdf : Property] 

rdfsax3 

rdf : predicate [rdf :type->rdf : Property] 

rdfsax4 

rdf : ob j  ect [rdf : type->rdf : Property] 

rdfsax5 

rdf : f irst [rdf : type->rdf : Property] 

rdfsax6 

rdf : rest [rdf : type->rdf : Property] 

rdfsax7 

rdf : value [rdf : type->rdf : Property] 

rdfsax8 

rdf : nil [rdf : type->rdf : List] 

rdfsax9 

rdf : type [rdf  s : domain->rdf : Property] 

rdfsaxlO 

rdf s : domain [rdf s : domain->rdf : Property] 

rdfsaxl  1 

rdf s : range [rdf s : domain->rdf : Property] 

rdfsaxl  2 

rdf s : subPropertyOf [ 

rdf s : domain->rdf : Property] 

rdfsaxl  3 

rdf s : subClassOf [rdf s : domain->rdf s : Class] 

rdfsaxl  4 

rdf : subject [rdf  s : domain->rdf : Statement] 

rdfsaxl  5 

rdf : predicate [rdf s : domain->rdf : Statement] 

rdfsaxl6 

rdf : obj  ect [rdf s : domain->rdf : Statement] 

rdfsaxl  7 

rdf s : member [rdf s : domain->rdfs : Resource] 

rdfsaxl  8 

rdf : first [rdf s : domain->rdf :List] 

rdfsaxl9 

rdf : rest [rdf  s : domain->rdf : List] 

rdfsax20 

rdf s : seeAlso [rdf s : domain->rdfs : Resource] 

rdfsax21 

rdf s : isDef inedBy [ 

rdf s :domain->rdfs : Resource] 

rdfsax22 

rdf s : comment [rdf s : domain->rdfs : Resource] 

rdfsax23 

rdf s : label [rdf s :domain->rdfs : Resource] 

rdfsax24 

rdf : value [rdf  s : domain->rdf  s : Resource] 

rdfsax25 

rdf : type [rdf s : range->rdf s : Class] 

rdfsax26 

rdf s : domain [rdf s :range->rdfs : Class] 

rdfsax27 

rdf s : range [rdf s : range->rdf s : Class] 

rdfsax28 

rdf s : subPropertyOf [ 

rdf s : range->rdf : Property] 

rdfsax29 

rdf s : subClassOf [rdf s : range->rdf s : Class] 

rdfsax30 

rdf : subject [rdf s : range->rdf s : Resource] 

rdfsax31 

rdf : predicate [rdf s : range->rdfs : Resource] 

rdfsax32 

rdf : object [rdf s : range->rdfs : Resource] 

rdfsax33 

rdf s : member [rdf s : range->rdf s : Resource] 

rdfsax34 

rdf : f irst [rdf  s : range->rdf  s : Resource] 

rdfsax35 

rdf : rest [rdf  s : range->rdf : List] 

rdfsax36 

rdf s : seeAlso [rdf s :range->rdfs : Resource] 

rdfsax37 

rdf s : isDef inedBy [rdf s : range->rdf s : Resource] 

rdfsax38 

rdf s : comment [rdf s : range->rdfs : Literal] 

rdfsax39 

rdf s : label [rdf s : range->rdf s : Resource] 
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rdfsax40 

rdf : value [rdf  s : range->rdf  s : Resource] 

rdfsax41 

rdf : Alt [rdf s : subClassOf->rdf s : Container] 

rdfsax42 

rdf :Bag [rdf s : subClassOf->rdf s : Container] 

rdfsax43 

rdf : Seq[rdf s : subClassOf->rdf s : Container] 

rdfsax44 

rdf s : ContainerMembershipProperty [ 

rdf s : subClassOf->rdf : Property] 

rdfsax45 

rdf s : isDef inedby [ 

rdf s : subPropertyOf->rdf s : seeAlso] 

rdfsax46 

rdf :XMLLiteral [rdf : type->rdfs : Datatype] 

rdfsax47 

rdf : XMLLiteral [ 

rdf s : subClassOf->rdf s : Literal] 

rdfsax48 

rdf s : Datatype [rdf s : subClassOf->rdf s : Class] 

B.2  OWL2-based  Rulesets 

This  section  provides  some  of  the  specific  details  regarding  restriction  of  the 
OWL2RL  rnleset  into  the  Par-OWL2  rnleset.  Table  B.6  contains  the  OWL2RL 
rnleset  prior  to  restriction.  Note  that  the  OWL2RL  rules  presented  herein  are 
different  than  those  from  [3].  They  are  a  RIF  variation  of  the  OWL2RL  rules  from 
[61],  deviating  to  make  the  rules  amenable  to  forward-chaining,  following  advice 
from  [61]  as  well. 

In  section  4.3.2,  the  restriction  of  the  OWL2  ruleset  into  the  Par-OWL2  ruleset 
is  briefly  described.  The  forced  variable  assignments  for  patterns  are  given  in  table 
B.8.  Table  B.10  gives  the  rules  that  were  not  eliminated,  referred  to  herein  as  the 
Par-OWL2  ruleset.  Comparing  table  B.10  with  table  B.6  reveals  which  rules  were 
eliminated.  Rules  marked  with  an  asterisk  (*)  in  table  B.10  are  rules  that  resulted 
from  a  split  during  step  2.  Table  B.8  gives  patterns  such  that,  when  facts  matching 
the  patterns  are  replicated,  parallel  Par-OWL2  inference  is  correct. 

Rules  marked  with  a  dagger  (1)  in  table  B.10  are  rules  that  were  excluded  in 
the  evaluation.  The  ruleset  consisting  of  the  Par-OWL2  rules  not  marked  with  a 
dagger  is  referred  to  as  the  Par-MemOWL2  ruleset.  Discussion  regarding  the  reason 
for  further  restriction  is  given  in  section  5.1.4. 

Table  B.6:  The  OWL2RL  Ruleset 


scm-int 

?c [owl : intersectionOf->?l] 

_markAHTypes  (?c  ?1) 

scm-int-1 

_markAHTypes  (?c  ?r) 

_markAHTypes  (?c  ?1) 
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?r [rdf : rest->?l] 

Not (?1  =  rdf: nil) 

scm-int-2 

jnarkAHTypes  (?c  ?1) 

?1 [rdf :f irst->?ci] 

?c  [rdf s : subClassOf->?ci] 

scm-uni 

?c  [owl :unionOf->?l] 

_checkUnionOf (?c  ?1) 

scm-uni-1 

_checkUnionOf (?c  ?r) 

?r [rdf : rest->?l] 

Not (?1  =  rdf: nil) 

_checkUnionOf (?c  ?1) 

scm-uni-2 

_checkUnionOf (?c  ?1) 

?1  [rdf : f irst->?ci] 

?ci  [rdf s : subClassOf->?c] 

scm-cls 

?c [rdf : type->owl : Class] 

?c [rdf s : subClassOf->?c] 

scm-clsl 

?c  [rdf : type->owl : Class] 

?c [owl : equivalentClass->?c] 

scm-cls2 

?c [rdf : type->owl : Class] 

?c [rdf s : subClassOf->owl : Thing] 

scm-cls3 

?c  [rdf : type->owl : Class] 

owl : Nothing [rdf  s : subClassOf ->?c] 

scm-sco 

?cl  [rdf s : subClass0f->?c2] 

?c2  [rdf s : subClass0f->?c3] 

?cl  [rdf s : subClass0f->?c3] 

scm-eqcl 

?cl [owl : equivalentClass->?c2] 

?cl  [rdf s : subClass0f->?c2] 

scm-eqcll 

?cl [owl : equivalentClass->?c2] 

?c2  [rdf s : subClassOf->?cl] 

scm-eqc2 

?cl  [rdf s : subClass0f->?c2] 

?c2  [rdf s : subClassOf->?cl] 

?cl [owl : equivalentClass->?c2] 

scm-op 

?p [rdf : type->owl : Ob j  ectProperty] 

?p [rdf s : subPropertyOf->?p] 

scm-opl 

?p [rdf : type->owl : Ob j  ectProperty] 

?p [owl : equivalentProperty->?p] 

scm-dp 

?p [rdf : type->owl : DatatypeProperty] 

?p [rdf s : subPropertyOf->?p] 

scm-dpl 

?p [rdf : type->owl : DatatypeProperty] 

?p [owl : equivalentProperty->?p] 

scm-spo 

?pl  [rdf s : subProperty0f->?p2] 

?p2  [rdf s : subProperty0f->?p3] 

?pl  [rdf s : subProperty0f->?p3] 

scm-eqpl 

?pl [owl : equivalentProperty->?p2] 

?pl  [rdf s : subProperty0f->?p2] 

scm-eqpll 

?pl [owl : equivalentProperty->?p2] 

?p2  [rdf s : subPropertyOf->?pl] 

scm-eqp2 

?pl  [rdf s : subProperty0f->?p2] 

?p2  [rdf s : subPropertyOf->?pl] 

?pl [owl : equivalentProperty->?p2] 

scm-doml 

?p [rdf s : domain->?cl] 

?cl  [rdf s : subClass0f->?c2] 

?p [rdf s : domain->?c2] 

scm-dom2 

?p2  [rdf  s : domain->?c] 

?pl  [rdf s : subProperty0f->?p2] 

?pl  [rdf s : domain->?c] 

scm-rngl 

?p [rdf s : range->?cl] 

?cl  [rdf s : subClass0f->?c2] 

?p [rdf s : range->?c2] 

scm-rng2 

?p2  [rdf s :range->?c] 

?pl  [rdf s : subProperty0f->?p2] 

?pl [rdf s : range->?c] 

scm-hv 

?cl  [owl :hasValue->?i] 

?cl  [owl : onProperty->?pl] 

?c2  [owl : hasValue->?i] 

?c2  [owl : onProperty->?p2] 

?pl  [rdf s : subProperty0f->?p2] 

?cl  [rdf s : subClass0f->?c2] 

scm-svfl 

?cl  [owl : someValuesFrom->?yl] 

?cl  [owl : onProperty->?p] 

?c2  [owl : someValuesFrom->?y2] 

?c2  [owl : onProperty->?p] 

?cl  [rdf s : subClass0f->?c2] 

eq-ref 

eq-refl 

eq-ref2 

eq-sym 

eq-trans 


eq-diffl 


prp-ap-1 


?yl [rdf s : subClass0f->?y2] 

?cl [owl : someValuesFrom->?y] 
?cl  [owl : onProperty->?pl] 

?c2 [owl : someValuesFrom->?y] 
?c2  [owl : onProperty->?p2] 

?pl  [rdf s : subProperty0f->?p2] 
?cl  [owl : allValuesFrom->?yl] 
?cl  [owl : onProperty->?p] 

?c2 [owl : allValuesFrom->?y2] 
?c2 [owl : onProperty->?p] 

?yl [rdf s : subClass0f->?y2] 

?cl [owl : allValuesFrom->?y] 
?cl  [owl : onProperty->?pl] 

?c2 [owl : allValuesFrora->?y] 
?c2  [owl : onProperty->?p2] 

?pl  [rdf s : subProperty0f->?p2] 
?s[?p->?o] 

?s[?p->?o] 

?s[?p->?o] 

?x  [owl : sameAs->?y] 

?x  [owl : sameAs->?y] 

?y  [owl : sameAs->?z] 

?s  [owl : sameAs->?s2] 
?s[?p->?o] 

?p  [owl : sameAs->?p2] 
?s[?p->?o] 

?o  [owl : sameAs->?o2] 
?s[?p->?o] 

?x [owl : sameAs->?y] 

?x [owl : dif f erentFrom->?y] 


?cl  [rdf s : subClass0f->?c2] 


prp-ap-idb 


prp-ap-d 


prp-ap-pv 


prp-ap-bcw 


prp-dom  |  ?p[rdf s  :domain->?c] 


?cl  [rdf s : subClass0f->?c2] 


?c2  [rdf s : subClassOf->?cl] 


?s  [owl : sameAs->?s] 
?p [owl : sameAs->?p] 
?o  [owl : sameAs->?o] 
?y [owl : sameAs->?x] 
?x  [owl : sameAs->?z] 


?s2 [?p->?o] 

?s[?p2->?o] 

?s[?p->?o2] 

rif : error () 

rdf s : label [rdf : type-> 

owl : AnnotationProperty] 

rdf s : comment [rdf : type->] 

owl : AnnotationProperty] 

rdf s : seeAlso [rdf : type->] 

owl : AnnotationProperty] 

rdf s : isDef inedBy [rdf : type->] 

owl : AnnotationProperty] 

owl : deprecated [rdf : type->] 

owl : AnnotationProperty] 

owl : priorVersion [rdf : type->] 

owl : AnnotationProperty] 

owl : backwardCompat ibleWith [rdf : type->] 

owl : AnnotationProperty] 

owl : incompatibleWith [rdf : type->] 

owl : AnnotationProperty] 

?x [rdf : type->?c] 


153 


?x[?p->?y] 

prp-rng 

?p [rdf s : range->?c] 

?x[?p->?y] 

?y [rdf : type->?c] 

prp-fp 

?p  [rdf : type->owl : FunctionalProperty] 

?x [?p->?yl] 

?x [?p->?y2] 

?yl  [owl : sameAs->?y2] 

prp-ifp 

?p [rdf : type->owl : InverseFunctionalProperty] 

?xl  [?p->?y] 

?x2[?p->?y] 

?xl [owl : sameAs->?x2] 

prp-irp 

?p [rdf : type->owl : Irref lexiveProperty] 

?x [?p->?x] 

rif : error () 

prp-symp 

?p [rdf : type->owl : SymmetricProperty] 

?x[?p->?y] 

?y[?p->?x] 

prp-asyp 

?p [rdf : type->owl : AsymmetricProperty] 

?x[?p->?y] 

?y [?p->?x] 

rif : error () 

prp-trp 

?p [rdf : type->owl :TransitiveProperty] 

?x[?p->?y] 

?y [?p->?z] 

?x [?p->?z] 

prp-spol 

?pl  [rdf s : subProperty0f->?p2] 

?x [?pl->?y] 

?x [?p2->?y] 

prp-eqpl 

?pl [owl : equivalentProperty->?p2] 

?x  [?pl->?y] 

?x  [?p2->?y] 

prp-eqp2 

?pl [owl : equivalentProperty->?p2] 

?x  [?p2->?y] 

?x [?pl->?y] 

prp-pdw 

?pl  [owl : propertyDis j  ointWith->?p2] 

?x [?pl->?y] 

?x[?p2->?y] 

rif : error () 

prp-invl 

?pl  [owl : inverse0f->?p2] 

?x [?pl->?y] 

?y [?p2->?x] 

prp-inv2 

?pl  [owl : inverse0f->?p2] 

?x [?p2->?y] 

?y [?pl->?x] 

cls-thing 

owl : Thing [rdf : type->owl : Class] 

cls-nothingl 

owl : Nothing [rdf : type->owl : Class] 

cls-nothing2 

?x [rdf : type->owl : Nothing] 

rif : error () 

cls-svfl 

?x [owl : someValuesFrom->?y] 

?x [owl : onProperty->?p] 

?u[?p->?v] 

?v [rdf : type->?y] 

?u [rdf : type->?x] 

cls-svf2 

?x [owl : someValuesFrom->owl : Thing] 

?x  [owl : onProperty->?p] 

?u [?p->?v] 

?u [rdf : type->?x] 

cls-avf 

?x [owl : allValuesFrom->?y] 

?x [owl : onProperty->?p] 

?u [rdf : type->?x] 

?u [?p->?v] 

?v [rdf : type->?y] 

cls-hvl 

?x  [owl : hasValue->?y] 

?u[?p->?y] 

?x [owl : onProperty->?p] 

?u [rdf : type->?x] 

cls-hv2  ?x  [owl  :hasValue->?y]  ?u[rdf :  type->?x] 

?x [owl : onProperty->?p] 

?u[?p->?y] 

cls-maxcl  ?x  [owl  :maxCardinality->0]  r  if:  error  () 

?x [owl : onProperty->?p] 

?u [rdf : type->?x] 

?u[?p->?y] 

cls-maxc2  ?x  [owl  :maxCardinality->l]  ?yl  [owl :  sameAs->?y2] 

?x [owl : onProperty->?p] 

?u [rdf : type->?x] 

?u [?p->?yl] 

?u [?p->?y2] 

cls-maxqcl  ?x [owl  :maxQualif  iedCardinality->0]  rif:error() 

?x [owl : onProperty->?p] 

?x [owl : onClass->?c] 

?u [rdf : type->?x] 

?u[?p->?y] 

?y [rdf : type->?c] 

cls-maxqc2  ?x  [owl  :maxQualif  iedCardinality->0]  rif:error() 

?x [owl : onProperty->?p] 

?x  [owl : onClass->owl : Thing] 

?u [rdf : type->?x] 

?u[?p->?y] 

cls-maxqc3  ?x  [owl  :maxQualif  iedCardinality->l]  ?yl  [owl :  sameAs->?y2] 

?x [owl : onProperty->?p] 

?x [owl : onClass->?c] 

?u [rdf : type->?x] 

?u [?p->?yl] 

?yl [rdf :type->?c] 

?u [?p->?y2] 

?y2  [rdf : type->?c] 

cls-maxqc4  ?x  [owl  :maxQualif  iedCardinality->l]  ?yl  [owl :  sameAs->?y2] 

?x [owl : onProperty->?p] 

?x  [owl : onClass->owl : Thing] 

?u [rdf : type->?x] 

?u[?p->?yl] 

?u[?p->?y2] 

cax-sco  ?cl  [rdf s : subClass0f->?c2]  ?x [rdf : type->?c2] 

?x  [rdf : type->?cl] 

cax-eqcl  ?cl [owl : equivalentClass->?c2]  ?x [rdf : type->?c2] 

?x [rdf : type->?cl] 

cax-eqc2  ?cl [owl : equivalentClass->?c2]  ?x [rdf : type->?cl] 

?x [rdf : type->?c2] 

?cl  [owl :disjointWith->?c2] 

?x [rdf : type->?cl] 


cax-dw 


rif : error () 


?x [rdf : type->?c2] 

prp-npal  ?x [owl : sourceIndividual->?il]  rif: error () 

?x [owl : assertionProperty->?p] 

?x [owl : target Individual->?i2] 

?il  [?p->?i2] 

prp-npa2  ?x [owl : sourceIndividual->?i]  rif: error () 

?x [owl : assertionProperty->?p] 

?x [owl : targetValue->?lt] 

?i[?p->?lt] 

cax-dw  ?cl  [owl :disjointWith->?c2]  rif:error() 

?x [rdf : type->?cl] 

?x [rdf : type->?c2] 

cls-com  ?cl  [owl :  complement0f->?c2]  rif:  error  () 

?x [rdf : type->?cl] 

?x [rdf : type->?c2] 

eq-difF2a  ?x  [rdf : type->owl : AllDif f erent]  _checkDif f erent (?x  ?y) 

?x [owl : distinctMembers->?y] 

eq-difF3a  ?x [rdf :type->owl : AllDif f erent]  _checkDiff erent (?x  ?y) 

?x [owl :members->?y] 

eq-diff23b  .checkDiff erent (?x  ?z)  _checkDiff erent (?x  ?y) 

?z [rdf : rest->?y] 

Not(?y  =  rdf: nil) 

eq-difF23c  .checkDiff erent (?x  ?yl)  rif: error () 

.checkDiff erent (?x  ?y2) 

Not (?yl  =  ?y2) 

?yl [rdf : f irst->?zl] 

?y2 [rdf : f irst->?z2] 

?zl  [owl : sameAs->?z2] 

prp-adp  ?r [rdf : type->owl : AllDisjointProperties]  _checkDisjointProperties (?r  ?1) 

?r  [owl : members  ->  ?1] 

prp-adp-1  .checkDisjointProperties (?r  ?x)  _checkDisjointProperties (?r  ?1) 

?x [rdf : rest->?l] 

Not (?1  =  rdf: nil) 

prp-adp-2  .checkDisjointProperties (?r  ?11)  rif: error () 

_checkDisjointProperties (?r  ?12) 

Not (?11  =  ?12) 

?11  [rdf : f irst->?x] 

?12  [rdf : f irst->?y] 

?o  [?x->?v] 

?o [?y->?v] 

cax-adc  ?r [rdf: type  ->  owl : AllDisjointClasses]  _checkDisjointClasses (?r  ?1) 

?r [owl : members  ->  ?1] 

cax-adc-1  .checkDisjointClasses (?r  ?x)  _checkDisjointClasses (?r  ?1) 

?x  [rdf : rest->?l] 

Not(?l  =  rdf:nil) 
checkDisjointClasses (?r  ?11) 
checkDisjointClasses (?r  ?12) 


cax-adc- 2 


rif : error () 
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Not (?11  =  ?12) 

?11  [rdf : f irst->?x] 

?12  [rdf : f irst->?y] 

?o  [rdf : type->?x] 

?o [rdf : type->?y] 

prp-spo2 

?p [owl : propertyChainAxiom->?pc] 

_markCheckChain(?p  ?pc) 

prp-spo2-l 

_markCheckChain(?p  ?q) 

?q[rdf :rest->?pc] 

Not(?pc  =  rdf:nil) 

_markCheckChain(?p  ?pc) 

prp-spo2-2 

_markCheckChain(?q  ?pc) 

?pc  [rdf : f irst->?p] 

?pc  [rdf :rest->rdf :nil] 

?start [?p->?last] 

_checkChain(?q  ?start  ?pc  ?last) 

prp-spo2-3 

?pc  [rdf : f irst->?p] 

?pc  [rdf :rest->?tl] 

?start [?p->?next] 

_checkChain(?q  ?next  ?tl  ?last) 

_checkChain(?q  ?start  ?pc  ?last) 

prp-spo2-4 

?p [owl : propertyChainAxiom->?pc] 

_checkChain(?p  ?start  ?pc  ?last) 

?start [?p->?last] 

els- inti 

jnarkAHTypes  (?c  ?1) 

?1  [rdf :f irst->?ty] 

?1 [rdf : rest->rdf : nil] 

?y [rdf : type->?ty] 

_allTypes(?c  ?1  ?y) 

cls-intl-1 

?1 [rdf : f irst->?ty] 

?1  [rdf : rest->?tl] 

?y [rdf : type->?ty] 

_allTypes(?c  ?tl  ?y) 

_allTypes(?c  ?1  ?y) 

cls-intl-2 

?c [owl : intersectionOf->?l] 

_allTypes(?c  ?1  ?y) 

?y [rdf : type->?c] 

prp-key 

?c [owl :hasKey->?u] 

_markSameKey(?c  ?u) 

prp-key-1 

_markSameKey(?c  ?v) 

?v [rdf : rest->?u] 

Not(?u  =  rdf: nil) 

_markSameKey(?c  ?u) 

prp-key-2 

_markSameKey(?c  ?u) 

?u [rdf : f irst->?key] 

?u [rdf : rest->rdf : nil] 

?x [?key->?v] 

?y [?key->?v] 

_sameKey(?c  ?u  ?x  ?y) 

prp-key-3 

?u [rdf : f irst->?key] 

?u [rdf : rest->?tl] 

?x [?key->?v] 

?y [?key->?v] 

_sameKey(?c  ?tl  ?x  ?y) 

_sameKey(?c  ?u  ?x  ?y) 

prp-key-4 

?c [owl :hasKey->?u] 

?x [rdf : type->?c] 

?y [rdf : type->?c] 

_sameKey(?c  ?u  ?x  ?y) 

?x  [owl : sameAs->?y] 
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cls-uni 

_checkUnionOf (?c  ?1) 

?1 [rdf : f irst->?ci] 

?y [rdf : type->?ci] 

?y [rdf : type->?c] 

cls-oo-a 

?c [owl : oneOf->?l] 

_checkOneOf (?c  ?1) 

cls-oo-b 

_checkOneOf (?c  ?r) 

?r  [rdf : rest->?l] 

Not(?l  =  rdf:nil) 

_checkOneOf (?c  ?1) 

cls-oo-c 

_checkOneOf (?c  ?1) 

?1 [rdf : f irst->?yi] 

?yi  [rdf :type->?c] 

cls-int2 

_markAHTypes  (?c  ?1) 

?1  [rdf : f irst->?ci] 

?y [rdf : type->?c] 

?y [rdf : type->?ci] 

Table  B.7:  The  OWL2RL  Metaclasses  and  Metaproperties 


MC 

MP 

owl : FunctionalProperty 

owl : InverseFunctionalProperty 

owl : Irref lexiveProperty 

owl : SymmetricProperty 

owl : AsymmetricProperty 

owl :TransitiveProperty 

owl : Class 

owl : Ob j  ectProperty 

owl : DatatypeProperty 

owl : AllDif f erent 

owl : AllDisjointProperties 

owl : AllDisjointClasses 

rdf s : domain 

rdf s : range 

rdf s : subPropertyOf 

owl : equivalentProperty 

owl : propertyDis j  ointWith 

owl : inverseOf 

owl : someValuesFrom 

owl : onProperty 

owl : allValuesFrom 

owl :hasValue 

owl : onClass 

rdf s : subClassOf 

owl : equivalentClass 

owl : dis j  ointWith 

owl : complementOf 

owl : distinctMembers 

owl : members 

owl : propertyChainAxiom 

owl : intersectionOf 

owl :hasKey 

owl :unionOf 

owl : oneOf 

rdf : first 

rdf : rest 

owl :maxCardinality 

owl : maxQualif iedCardinality 
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Table  B.8:  Forced  Assignments  from  Steps  1  and  3  of  the  Methodology 
applied  to  the  OWL2RL  ruleset 


Replicate  (ct) 

Arbitrary  (e) 

And(?xl[?x2->?x3]  ?x2  eMP) 

And(?xl  [rdf  :  type->?x3]  ?x3  E  MC) 

And(_markAllTypes (?xl  ?x2)) 

And(_allTypes (?xl  ?x2)) 

And(_checkUnionOf (?xl  ?x2)) 

And (_checkDif f erent (?xl  ?x2)) 

And(_checkDisjointProperties (?xl  ?x2) ) 

And(_checkDisjointClasses(?xl  ?x2) ) 

And(_markCheckChain(?xl  ?x2)) 

And(_checkChain(?xl  ?x2  ?x3  ?x4)) 

And(_raarkSameKey (?xl  ?x2)) 

And(_sameKey (?xl  ?x2  ?x3  ?x4)) 

And(?xl  =  ?x2) 

And(?xl[?x2->?x3]  ?x2  £  MPT) 
And(?xl  [rdf  :  type->?x3]  ?x3  ^  MC) 

Table  B.9:  Replication  Patterns  for  Correct,  Parallel  Par-OWL2  Infer¬ 
ence 


Replicate  (a) 

And(?xl[?x2->?x3]  ?x2  eMP) 

And(?xl  [rdf  :  type->?x3]  ?x3  E  MC) 
And(_markAllTypes(?xl  ?x2)) 
And(_allTypes (?xl  ?x2)) 
And(_checkUnionOf  (?xl  ?x2)) 
And(_checkDif f erent (?xl  ?x2)) 
And(_checkDisjointProperties(?xl  ?x2) ) 
And(_checkDisjointClasses (?xl  ?x2) ) 
And(_markCheckChain(?xl  ?x2)) 
And(_checkChain(?xl  ?x2  ?x3  ?x4)) 
And(_markSameKey (?xl  ?x2)) 

And(_sameKey (?xl  ?x2  ?x3  ?x4)) 

And(?xl  =  ?x2) 

And(_checkOneOf  (?xl  ?x2)) 


Table  B.10:  The  Par-OWL2  Ruleset 


Rule  ID 

If  And(.  .  .) 

Then  Do  (Assert  (...)  ) 

scm-intl 

?c  [owl : intersectionOf->?l] 

_markAHTypes  (?c  ?1) 

scm-int2 

_markAHTypes  (?c  ?r) 

?r [rdf : rest->?l] 

Not(?l  =  rdf:nil) 

_markAHTypes  (?c  ?1) 
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scm-int3 

_markAHTypes  (?c  ?1) 

?1 [rdf : f irst->?ci] 

?c  [rdf s : subClassOf->?ci] 

scm-unil 

?c [owl : unionOf ->?1] 

_checkUnionOf (?c  ?1) 

scm-uni2 

_checkUnionOf (?c  ?r) 

?r [rdf : rest->?l] 

Not (?1  =  rdf: nil) 

_checkUnionOf (?c  ?1) 

scm-uni3 

_checkUnionOf (?c  ?1) 

?1  [rdf : f irst->?ci] 

?ci  [rdf s : subClassOf->?c] 

scm-clsT 

?c [rdf : type->owl : Class] 

?c  [rdf s : subClassOf->?c] 

scm-clslT 

?c [rdf : type->owl : Class] 

?c [owl : equivalentClass->?c] 

scm-cls2T 

?c  [rdf : type->owl : Class] 

?c [rdf s : subClassOf->owl : Thing] 

scm-cls3T 

?c [rdf : type->owl : Class] 

owl : Nothing [rdf s : subClassOf->?c] 

scm-sco 

?cl  [rdf s : subClass0f->?c2] 

?c2  [rdf s : subClass0f->?c3] 

?cl  [rdf s : subClass0f->?c3] 

scm-eqcl 

?cl [owl : equivalentClass->?c2] 

?cl  [rdf s : subClass0f->?c2] 

scm-eqcll 

?cl [owl : equivalentClass->?c2] 

?c2  [rdf s : subClassOf->?cl] 

scm-eqc2 

?cl  [rdf s : subClass0f->?c2] 

?c2  [rdf s : subClassOf->?cl] 

?cl [owl : equivalentClass->?c2] 

scm-opT 

?p [rdf : type->owl : Ob j  ectProperty] 

?p  [rdf s : subPropertyOf->?p] 

scm-opl ' 

?p [rdf : type->owl : Ob j  ectProperty] 

?p [owl : equivalentProperty->?p] 

scm-dpT 

?p [rdf : type->owl : DatatypeProperty] 

?p  [rdf s : subPropertyOf->?p] 

scm-dplT 

?p [rdf : type->owl : DatatypeProperty] 

?p [owl : equivalentProperty->?p] 

scm-spo 

?pl  [rdf s : subProperty0f->?p2] 

?p2  [rdf s : subProperty0f->?p3] 

?pl  [rdf s : subProperty0f->?p3] 

scm-eqpl 

?pl [owl : equivalentProperty->?p2] 

?pl  [rdf s : subProperty0f->?p2] 

scm-eqpll 

?pl [owl : equivalentProperty->?p2] 

?p2  [rdf s : subPropertyOf->?pl] 

scm-eqp2 

?pl  [rdf s : subProperty0f->?p2] 

?p2  [rdf s : subPropertyOf->?pl] 

?pl [owl : equivalentProperty->?p2] 

scm-doml 

?p [rdf s :domain->?cl] 

?cl  [rdf s : subClass0f->?c2] 

?p  [rdf s : domain->?c2] 

scm-dom2 

?p2  [rdf s : domain->?c] 

?pl  [rdf s : subProperty0f->?p2] 

?pl  [rdf s : domain->?c] 

scm-rngl 

?p [rdf s :range->?cl] 

?cl  [rdf s : subClass0f->?c2] 

?p  [rdf s : range->?c2] 

scm-rng2 

?p2  [rdf s : range->?c] 

?pl  [rdf s : subProperty0f->?p2] 

?pl [rdf s :range->?c] 

scm-hv 

?cl  [owl :hasValue->?i] 

?cl  [owl : onProperty->?pl] 

?c2  [owl :hasValue->?i] 

?c2  [owl : onProperty->?p2] 

?pl  [rdf s : subProperty0f->?p2] 

?cl  [rdf s : subClass0f->?c2] 

scm-svfl 

?cl  [owl : someValuesFrom->?yl] 

?cl  [owl : onProperty->?p] 

?c2  [owl : someValuesFrom->?y2] 

?c2  [owl : onProperty->?p] 

?yl [rdf s : sbuClass0f->?y2] 

?cl  [rdf s : subClass0f->?c2] 

160 


scm-svf2 

?cl  [owl : someValuesFrom->?y] 

?cl  [owl : onProperty->?pl] 

?c2  [owl : someValuesFrom->?y] 

?c2  [owl : onProperty->?p2] 

?pl  [rdf s : subProperty0f->?p2] 

?cl  [rdf s : subClass0f->?c2] 

scm-avfl 

?cl [owl : allValuesFrom->?yl] 

?cl  [owl : onProperty->?p] 

?c2 [owl : allValuesFrom->?y2] 

?c2  [owl : onProperty->?p] 

?yl [rdf s : subClass0f->?y2] 

?cl  [rdf s : subClass0f->?c2] 

scm-avf2 

?cl [owl : allValuesFrom->?y] 

?cl  [owl : onProperty->?pl] 

?c2 [owl : allValuesFrom->?y] 

?c2  [owl : onProperty->?p2] 

?pl  [rdf s : subProperty0f->?p2] 

?c2  [rdf s : subClassOf->?cl] 

eq-ref^ 

?s[?p->?o] 

?s  [owl : sameAs->?s] 

eq-reflJ 

?s[?p->?o] 

?p  [owl : sameAs->?p] 

eq-ref2t 

?s[?p->?o] 

?o  [owl : sameAs->?o] 

eq-sym 

?x  [owl : sameAs->?y] 

?y  [owl : sameAs->?x] 

prp-ap-1' 

rdf  s : label [rdf : type-> 

owl : AnnotationProperty] 

prp-ap-cT 

rdf  s : comment [rdf : type-> 

owl : AnnotationProperty] 

prp-ap-sa' 

rdf s : seeAlso [rdf : type-> 

owl : AnnotationProperty] 

prp-ap-idbT 

rdf s : isDef inedBy [rdf : type-> 

owl : AnnotationProperty] 

prp-ap-dT 

owl : deprecated [rdf : type-> 

owl : AnnotationProperty] 

prp-ap-pv' 

owl : priorVersion [rdf : type-> 

owl : AnnotationProperty] 

prp-ap-bcw' 

owl : backwardCompatibleWith [rdf : type-> 

owl : AnnotationProperty] 

prp-ap-iw' 

owl : incompatibleWith [rdf : type-> 

owl : AnnotationProperty] 

prp-dom* 

p [rdf s : domain->?c] 

?x  [?p->?y] 

?c  g  MC 

?x  [rdf : type->?c] 

prp-rng* 

?p [rdf s : range->?c] 

?x[?p->?y] 

?c  £  MC 

?y [rdf : type->?c] 

prp-irp 

?p [rdf : type->owl : Irref lexiveProperty] 

?x  [?p->?x] 

rif : error () 

prp-sympl* 

?p [rdf : type->owl : SymmetricProperty] 

?x[?p->?y] 

?p  g  MPT 

?y [?p->?x] 

prp-symp2 


prp-trp* 


prp-spoll* 


prp-spol2* 


prp-eqpll* 


prp-eqpl2* 


prp-eqp21* 


prp-eqp22* 


prp-invll* 


prp-invl2* 


prp-inv21* 


prp-inv22* 


cls-thingT 

cls-nothingP 

cls-nothing2 


?p [rdf : type->owl : SymmetricProperty] 

?x [rdf : type->?y] 

?x  £  MC 

?p [rdf : type->owl : TransitiveProperty] 
?x[?p->?y] 

?y[?p->?z] 

?p  G  MP 

?pl  [rdf s : subProperty0f->?p2] 

?x [?pl->?y] 

?p2  £  MPT 

?pl  [rdf s : subPropertyOf ->rdf : type] 

?x [?pl->?y] 

?y  g  MC 

?pl [owl : equivalentProperty->?p2] 

?x [?pl->?y] 

?p2  ^  MPT 

?pl [owl : equivalentProperty->rdf : type] 
?x [?pl->?y] 

?y  £  MC 

?pl [owl : equivalentProperty->?p2] 
?x[?p2->?y] 

?pl  ^  MPT 

rdf : type [owl : equivalentProperty->?p2] 
?x[?p2->?y] 

?y  g  MC 

?pl  [owl : inverse0f->?p2] 

?x  [?pl->?y] 

?p2  ^  MPT 

?pl  [owl : inverseOf ->rdf : type] 

?x [?pl->?y] 

?x  g  MC 

?pl  [owl : inverse0f->?p2] 

?x[?p2->?y] 

?pl  £  MPT 

rdf : type [owl : inverseOf ->?p2] 
?x[?p2->?y] 

?x  £  MC 


?x [rdf : type->owl : Nothing] 

?x [owl : someValuesFrom->owl : Thing] 
?x  [owl : onProperty->?p] 

?u [?p->?v] 

?x  £  MC 

?x [owl : hasValue->?y] 

?x [owl : onProperty->?p] 


?y [rdf : type->?x] 


?x  [?p->?z] 


?x[?p2->?y] 


?x  [rdf : type->?y] 


?x[?p2->?y] 


?x  [rdf : type->?y] 


?x[?pl->?y] 


?x  [rdf : type->?y] 


?y [?p2->?x] 


?y  [rdf : type->?x] 


?y [?pl->?x] 


?y [rdf : type->?x] 


owl : Thing [rdf : type->owl : Class] 
owl : Nothing [rdf : type->owl : Class] 
rif : error () 

?u [rdf : type->?x] 


?u[?p->?y] 
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?u [rdf : type->?x] 

?p  £  MPT 

cls-hvl2* 

?x  [owl : hasValue->?y] 

?x [owl : onProperty->rdf : type] 

?u [rdf : type->?x] 

?y  £  MC 

?u [rdf : type->?y] 

cls-hv2* 

?x  [owl : hasValue->?y] 

?x  [owl : onProperty->?p] 

?u[?p->y] 

?x  g  MC 

?u [rdf : type->?y] 

cax-sco* 

?cl  [rdf s : subClass0f->?c2] 

?x [rdf : type->?cl] 

?c2  £  MC 

?x  [rdf : type->?c2] 

cax-eqcl* 

?cl [owl : equivalentClass->?c2] 

?x [rdf : type->?cl] 

?c2  £  MC 

?x  [rdf : type->?c2] 

cax-eqc2* 

?cl [owl : equivalentClass->?c2] 

?x  [rdf : type->?c2] 

?cl  £  MC 

?x  [rdf : type->?cl] 

eq-diff2a 

?x [rdf : type->owl : AllDif f erent] 

?x  [owl : distinctMembers->?y] 

_checkDiff erent (?x  ?y) 

eq-diff3a 

?x [rdf : type->owl : AllDif f erent] 

?x  [owl : members->?y] 

_checkDiff erent (?x  ?y) 

eq-diff23b 

_checkDiff erent (?x  ?z) 

?z [rdf : rest->?y] 

Not(?y  =  rdf: nil) 

_checkDiff erent (?x  ?y) 

eq-diff23c 

_checkDiff erent (?x  ?yl) 

_checkDiff erent (?x  ?y2) 

Not (?yl  =  ?y2) 

?yl [rdf : f irst->?zl] 

?y2 [rdf : f irst->?z2] 

?zl  [owl : sameAs->?z2] 

rif : error () 

cls-uni* 

_checkUnionOf (?c  ?1) 

?1  [rdf : f irst->?ci] 

?y [rdf : type->?ci] 

?c  £  MC 

?y [rdf : type->?c] 

cls-oo-a 

?c  [owl : oneOf ->?1] 

_checkOneOf (?c  ?1) 

cls-oo-b 

_checkOneOf (?c  ?r) 

?r [rdf :rest-?l] 

Not (?1  =  rdf: nil) 

_checkOneOf (?c  ?1) 

cls-oo-cl* 

_checkOneOf (?c  ?1) 

?1 [rdf : f irst->?yi] 

?c  £  MC 

?yi [rdf :type->?c] 

cls-oo-c2* 

_checkOneOf (?c  ?1) 

?1 [rdf : f irst->?yi] 

?c  <E  MC 

?yi [rdf : type->?c] 

cls-int2* 

_markAHTypes  (?c  ?1) 

?y  [rdf : type->?ci] 
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?1 [rdf : f irst->?ci] 
?y [rdf : type->?c] 
?ci  g  MC 


B.3  Verbatim  Rulesets  Used  in  Evaluation 

For  the  purposes  of  reproducibility,  this  section  simply  includes  the  rulesets  as 
they  were  provided  to  the  inference  engine.  They  are  specified  in  a  modified  RIF- 
Core  syntax  which  is  likely  intuitively  understandable  to  those  familiar  with  RIF- 
Core.  Note  that  the  #  symbol  is  used  at  the  beginning  of  lines  to  indicate  comments, 
some  of  which  are  interpreted  by  the  rule  compiler  (e.g.,  #DEFINE  and  #PRAGMA). 
#PRAGMA  is  used  to  specify  patterns  for  replication  or  arbitrary  placement.  Note  that 
unlike  the  definition  of  Pattern  used  in  this  thesis  which  utilizes  negated  equality 
formulas  as  restrictions,  the  rules  below  instead  specify  restrictions  by  negating 
a  built-in  formula  with  predicate  pred:  list-contains.  This  detail  is  obscured, 
however,  by  using  symbols  IN  and  NOTIN.  #DEFINE  has  similar  behavior  to  #def  ine 
in  C++.  Note  that  rule  labels  given  in  (*  *)  may  not  necessarily  be  consistent 
with  the  names  used  throughout  the  rest  of  this  thesis. 

B.3.1  Par-CoreRDFS 

Pref ix (rdf  <http : // www . w3 . org/ 1999/02/22-rdf -syntax-ns#>) 

Pref ix(rdf s  <http://www.w3.Org/2000/01/rdf-schema#>) 

Pref ix(owl  <http://www.w3.Org/2002/07/owl#>) 

Pref ix(xsd  <http://www.w3.Org/2001/XMLSchema#>) 

Pref ix(rif  <http: //www.w3. org/2007/rif #>) 

Pref ix(func  <http://www.w3.Org/2007/rif-builtin-function#>) 

Pref ix(pred  <http://www.w3.Org/2007/rif-builtin-predicate#>) 

Pref ix(dc  <http: //purl . org/dc/terms/>) 

#DEFINE  IN  External (pred: list-contains ( 

#DEFINE  /IN  )) 

#DEFINE  NOTIN  Not ({IN} 

#DEFINE  /NOTIN  {/IN}) 

#DEFINE  $MP  List (rdf s : domain  rdf s: range  rdf s : subPropertyOf  rdf s : subClassOf ) 

#PRAGMA  REPLICATE  And (?p [rdf s : domain  ->  ?c] ) 

#PRAGMA  REPLICATE  And (?p [rdf s : range  ->  ?c] ) 

#PRAGMA  REPLICATE  And (?pl [rdf s : subPropertyOf  ->  ?p2] ) 

#PRAGMA  REPLICATE  And(?cl [rdf s : subClassOf  ->  ?c2] ) 

#PRAGMA  ARBITRARY  And(?x[?p2  ->  ?y]  {N0TIN}{$MP}  ?p2{/N0TIN}) 
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(*  <#scm-spo>  *) 

Forall  ?p3  ?p2  ?pl  ( 

?pl [rdf s : subProperty0f->?p3]  And( 

?pl [rdf s : subProperty0f->?p2] 

?p2 [rdf s : subProperty0f->?p3]  )) 

(*  <#scm-sco>  *) 

Forall  ?cl  ?c2  ?c3  ( 

?cl [rdf s : subClass0f->?c3]  And( 

?cl [rdf s : subClass0f->?c2] 

?c2 [rdf s : subClass0f->?c3]  )) 

(*  <#prp-spol>  *) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?x[?p2->?y]  And( 

?pl [rdf s : subProperty0f->?p2] 
{N0TIN}{$MP>  ?p2{/N0TIN> 
?x[?pl->?y]  )) 

(*  <#prp-dom>  *) 

Forall  ?p  ?c  ?x  ?y  ( 

?x[rdf :type->?c]  And( 

?p [rdf  s : domain->?c] 

?x[?p->?y]  )) 

(*  <#prp-rng>  *) 

Forall  ?p  ?c  ?x  ?y  ( 

?y [rdf :type->?c]  And( 

?p [rdf  s : range->?c] 

?x [?p->?y]  )) 

(*  <#cax-sco>  *) 

Forall  ?x  ?cl  ?c2  ( 

?x[rdf :type->?c2]  And( 

?cl [rdf s : subClass0f->?c2] 

?x[rdf :type->?cl]  )) 


#E0F 

B.3.2  Par-MemOWL2 

Pref ix(rdf  <http://www.w3.Org/1999/02/22-rdf-syntax-ns#>) 
Pref ix (rdf s  <http : / /www . w3 . org/2000/01/rdf -schema#>) 

Pref ix (owl  <http : / /www . w3 . org/2002/ 07/owl#>) 

Pref ix (xsd  <http : / /www . w3 . org/2001/XMLSchema#>) 

Pref ix (rif  <http : / /www . w3 . org/2007/rif #>) 

Pref ix (f unc  <http : / /www . w3 . org/2007/rif-built in-f unction#>) 
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Prefix (pred  <http://www.w3 . org/2007/rif-builtin-predicate#>) 
Pref ix(dc  <http://purl.org/dc/terms/>) 

#DEFINE  IN  External (pred : list-contains ( 

#DEFINE  /IN  )) 

#DEFINE  NOTIN  Not ({IN} 

#DEFINE  /NOTIN  {/IN}) 


#DEFINE  $MC  List (owl : FunctionalProperty  owl : InverseFunctionalProperty 
owl : Irref lexiveProperty  owl : SymmetricProperty  owl : AsymmetricProperty 
owl :TransitiveProperty  owl: Class  owl : ObjectProperty  owl : DatatypeProperty 
owl : AllDif f erent  owl : AllDis j  ointProperties  owl : AllDis j ointClasses) 

#DEFINE  $MP  List (rdf s : domain  rdf s: range  rdf s : subPropertyOf  owl : equivalentProperty 
owl :propertyDisjointWith  owl : inverseOf  owl : someValuesFrom  owl : onProperty 
owl : allValuesFrom  owl:hasValue  owl:onClass  rdf s : subClassOf  owl : equivalentClass 
owl :disjointWith  owl : complementOf  owl : distinctMembers  owl: members 
owl :propertyChainAxiom  owl : intersect ionOf  owl:hasKey  owl:unionOf  owl:oneOf 
rdf: first  rdf: rest  owl :maxCardinality  owl :maxQualif iedCardinality) 

#DEFINE  $MPT  List (rdf s : domain  rdfs: range  rdf s : subPropertyOf  owl : equivalentProperty 
owl :propertyDisjointWith  owl : inverseOf  owl : someValuesFrom  owl : onProperty 
owl : allValuesFrom  owl:hasValue  owl:onClass  rdf s : subClassOf  owl : equivalentClass 
owl :disjointWith  owl : complementOf  owl : distinctMembers  owl: members 
owl :propertyChainAxiom  owl : intersect ionOf  owl:hasKey  owl:unionOf  owl:oneOf 
rdf: first  rdf: rest  owl :maxCardinality  owl :maxQualif iedCardinality  rdf: type) 


#  REPLICATE  ONTOLOGY 

#PRAGMA  REPLICATE  And(?s [?p->?o]  {IN}{$MP}  ?p{/IN» 
#PRAGMA  REPLICATE  And(?s [rdf : type->?o]  {IN}{$MC}  ?o{/IN» 
#PRAGMA  REPLICATE  And(_markAllTypes (?a  ?b)) 

#PRAGMA  REPLICATE  And(_allTypes (?a  ?b  ?c)) 

#PRAGMA  REPLICATE  And(_checkUnionOf (?a  ?b)) 

#PRAGMA  REPLICATE  And(_checkDiff erent (?a  ?b) ) 

#PRAGMA  REPLICATE  And (_checkDisj ointProperties (?a  ?b)) 
#PRAGMA  REPLICATE  And (_checkDisj ointClasses (?a  ?b)) 
#PRAGMA  REPLICATE  And(_markCheckChain(?a  ?b) ) 

#PRAGMA  REPLICATE  And(_checkChain(?a  ?b  ?c  ?d) ) 

#PRAGMA  REPLICATE  And(_markSameKey (?a  ?b) ) 

#PRAGMA  REPLICATE  And(_sameKey (?a  ?b  ?c  ?d)) 


#  REPLICATE  BUILTINS  (pred: list-contains  special,  don’t  worry  about  it) 


#  REPLICATE  SELECTIVE  PATTERNS 
#PRAGMA  REPLICATE  And(?a  =  ?b) 

#PRAGMA  ARBITRARY  And(?s [?p->?o]  {NOTIN}{$MPT>  ?p{/NOTIN}) 
#PRAGMA  ARBITRARY  And(?s [rdf : type->?o]  {NOTIN}{$MC}  ?o{/NOTIN}) 
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(*  <#scm-int>  *) 

Forall  ?c  ?1  ( 

_markAHTypes  (?c  ?1)  ?c  [owl :  intersectionOf->?l]  ) 

Forall  ?c  ?1  ?r  ( 

_markAHTypes  (?c  ?1)  And  ( 

_markAHTypes  (?c  ?r) 

?r  [rdf : rest->?l] 

Not(?l  =  rdf: nil)  )) 

Forall  ?c  ?ci  ?1  ( 

?c [rdf s : subClassOf->?ci]  And  ( 

_markAHTypes  (?c  ?1) 

?1 [rdf : f irst->?ci]  )) 

(*  <#scm-uni>  *) 

Forall  ?c  ?1  ( 

_checkUnionOf (?c  ?1)  ?c [owl :unionOf->?l]  ) 

Forall  ?c  ?1  ?r  ( 

_checkUnionOf (?c  ?1)  And( 

_checkUnionOf (?c  ?r) 

?r [rdf : rest->?l] 

Not(?l  =  rdf: nil)  )) 

Forall  ?c  ?ci  ?1  ( 

?ci [rdf s : subClassOf->?c]  :-  And  ( 

_checkUnionOf (?c  ?1) 

?1 [rdf : f irst->?ci]  )) 

#UNINTERESTING 
#(*  <#scm-cls>  *) 

#Forall  ?c  ( 

#  ?c [rdf s : subClassOf->?c]  :-  ?c [rdf : type->owl : Class] ) 

#UNINTERESTING 
#(*  <#scm-clsl>  *) 

#Forall  ?c  ( 

#  ?c [owl : equivalentClass->?c]  :-  ?c [rdf : type->owl : Class] ) 

#N0T  WORTH  THE  MEMORY 
#(*  <#scm-cls2>  *) 

#Forall  ?c  ( 

#  ?c [rdf s : subClassOf->owl : Thing]  :-  ?c [rdf : type->owl : Class] ) 

#N0T  WORTH  THE  MEMORY 
#(*  <#scm-cls3>  *) 
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#Forall  ?c  ( 

#  owl : Nothing [rdf s : subClassOf->?c]  ?c [rdf :type->owl : Class] ) 

(*  <#scm-sco>  *) 

Forall  ?cl  ?c2  ?c3  ( 

?cl [rdf s : subClass0f->?c3]  And( 

?cl [rdf s : subClass0f->?c2] 

?c2 [rdf s : subClass0f->?c3]  )) 

(*  <#scm-eqcl>  *) 

Forall  ?cl  ?c2  ( 

?cl [rdf s : subClass0f->?c2]  ?cl [owl : equivalentClass->?c2] ) 

(*  <#scm-eqcll>  *) 

Forall  ?cl  ?c2  ( 

?c2 [rdf s : subClassOf->?cl]  ?cl [owl : equivalentClass->?c2] ) 

(*  <#scm-eqc2>  *) 

Forall  ?cl  ?c2  ( 

?cl [owl : equivalentClass->?c2]  And( 

?cl [rdf s : subClass0f->?c2] 

?c2 [rdf s : subClassOf->?cl]  )) 

#UN INTERESTING 
#(*  <#scm-op>  *) 

#Forall  ?p  ( 

#  ?p [rdf s : subPropertyOf->?p]  ?p  [rdf : type->owl : ObjectProperty] ) 

#UN INTERESTING 
#(*  <#scm-opl>  *) 

#Forall  ?p  ( 

#  ?p [owl : equivalentProperty->?p]  ?p [rdf : type->owl : ObjectProperty] ) 

#UN INTERESTING 
#(*  <#scm-dp>  *) 

#Forall  ?p  ( 

#  ?p [rdf s : subPropertyOf->?p]  ?p [rdf : type->owl :DatatypeProperty] ) 

#UNINTERESTING 
#(*  <#scm-dpl>  *) 

#Forall  ?p  ( 

#  ?p [owl : equivalentProperty->?p]  : -  ?p [rdf : type->owl : DatatypeProperty] ) 

(*  <#scm-spo>  *) 

Forall  ?p3  ?p2  ?pl  ( 

?pl [rdf s : subProperty0f->?p3]  And( 

?pl [rdf s : subProperty0f->?p2] 
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?p2 [rdf s : subPropertyOf->?p3]  )) 

(*  <#scm-eqpl>  *) 

Forall  ?p2  ?pl  ( 

?pl [rdf s : subPropertyOf ->?p2]  : -  ?pl [owl : equivalentProperty->?p2] ) 

(*  <#scm-eqpll>  *) 

Forall  ?p2  ?pl  ( 

?p2 [rdf s : subPropertyOf ->?pl]  : -  ?pl [owl : equivalentProperty->?p2] ) 

(*  <#scm-eqp2>  *) 

Forall  ?p2  ?pl  ( 

?pl [owl : equivalentProperty->?p2]  : -  And ( 

?pl [rdf s : subProperty0f->?p2] 

?p2 [rdf s : subPropertyOf->?pl]  )) 

(*  <#scm-doml>  *) 

Forall  ?p  ?cl  ?c2  ( 

?p [rdf s : domain->?c2]  And( 

?p [rdf s :domain->?cl] 

?cl [rdf s : subClass0f->?c2]  )) 

(*  <#scm-dom2>  *) 

Forall  ?c  ?p2  ?pl  ( 

?pl [rdf s :domain->?c]  And( 

?p2 [rdf s : domain->?c] 

?pl [rdf s : subProperty0f->?p2]  )) 

(*  <#scm-rngl>  *) 

Forall  ?p  ?cl  ?c2  ( 

?p [rdf s : range->?c2]  And( 

?p [rdf s :range->?cl] 

?cl [rdf s : subClass0f->?c2]  )) 

(*  <#scm-rng2>  *) 

Forall  ?c  ?p2  ?pl  ( 

?pl [rdf s :range->?c]  And( 

?p2 [rdf s : range->?c] 

?pl [rdf s : subProperty0f->?p2]  )) 

(*  <#scm-hv>  *) 

Forall  ?cl  ?c2  ?i  ?p2  ?pl  ( 

?cl [rdf s : subClass0f->?c2]  And( 

?cl [owl :hasValue->?i] 

?cl [owl : onProperty->?pl] 

?c2 [owl :hasValue->?i] 

?c2 [owl : onProperty->?p2] 
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?pl [rdf s : subProperty0f->?p2]  )) 

(*  <#scm-svfl>  *) 

Forall  ?p  ?y2  ?cl  ?c2  ?yl  ( 

?cl [rdf s : subClass0f->?c2]  And( 

?cl [owl : someValuesFrom->?yl] 

?cl [owl : onProperty->?p] 

?c2 [owl : someValuesFrom->?y2] 

?c2 [owl : onProperty->?p] 

?yl [rdf s : subClass0f->?y2]  )) 

(*  <#scm-svf2>  *) 

Forall  ?cl  ?c2  ?y  ?p2  ?pl  ( 

?cl [rdf s : subClass0f->?c2]  And( 

?cl [owl : someValuesFrom->?y] 

?cl [owl : onProperty->?pl] 

?c2 [owl : someValuesFrom->?y] 

?c2 [owl : onProperty->?p2] 

?pl [rdf s : subProperty0f->?p2]  )) 

(*  <#scm-avfl>  *) 

Forall  ?p  ?y2  ?cl  ?c2  ?yl  ( 

?cl [rdf s : subClass0f->?c2]  And( 

?cl [owl : allValuesFrom->?yl] 

?cl [owl : onProperty->?p] 

?c2 [owl : allValuesFrom->?y2] 

?c2 [owl : onProperty->?p] 

?yl [rdf s : subClass0f->?y2]  )) 

(*  <#scm-avf2>  *) 

Forall  ?cl  ?c2  ?y  ?p2  ?pl  ( 

?c2 [rdf s : subClassOf->?cl]  And( 

?cl [owl : allValuesFrom->?y] 

?cl [owl : onProperty->?pl] 

?c2 [owl : allValuesFrom->?y] 

?c2 [owl : onProperty->?p2] 

?pl [rdf s : subProperty0f->?p2]  )) 

#N0T  WORTH  THE  MEMORY 
#(*  <#eq-ref>  *) 

#Forall  ?p  ?o  ?s  ( 

#  ?s [owl : sameAs->?s]  ?s[?p->?o]) 

#N0T  WORTH  THE  MEMORY 
#(*  <#eq-refl>  *) 

#Forall  ?p  ?o  ?s  ( 

#  ?p [owl : sameAs->?p]  ?s[?p->?o]) 
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#NOT  WORTH  THE  MEMORY 
#(*  <#eq-ref2>  *) 

#Forall  ?p  ?o  ?s  ( 

#  ?o [owl : sameAs->?o]  ?s[?p->?o]) 

(*  <#eq-sym>  *) 

Forall  ?x  ?y  ( 

?y [owl : sameAs->?x]  ?x [owl : sameAs->?y] ) 

#ELIMINATED 

#(*  <#eq-trans>  *) 

#Forall  ?x  ?z  ?y  ( 

#  ?x [owl : sameAs->?z]  And( 

#  ?x [owl : sameAs->?y] 

#  ?y [owl : sameAs->?z]  )) 

#SPLIT 

#ELIMINATED 

#(*  <#eq-rep-s>  *) 

#Forall  ?p  ?o  ?s  ?s2  ( 

#  ?s2  [?p->?o]  And( 

#  ?s [owl : sameAs->?s2] 

#  ?s[?p->?o] 

#  {N0TIN}{$MPT>  ?p{/N0TIN}  )) 

#ELIMINATED 

#Forall  ?p  ?o  ?s  ?s2  ( 

#  ?s2[?p->?o]  And( 

#  ?s [owl : sameAs->?s2] 

#  ?s[?p->?o] 

#  {IN>{$MP>  ?p{/IN>  )) 

#ELIMINATED 

#Forall  ?p  ?o  ?s  ?s2  ( 

#  ?s2  [rdf : type->?o]  And( 

#  ?s [owl : sameAs->?s2] 

#  ?s [rdf : type->?o] 

#  {M0TIN}{$MC>  ?o{/N0TIN>  )) 

#ELIMINATED 

#Forall  ?p  ?o  ?s  ?s2  ( 

#  ?s2  [rdf : type->?o]  And( 

#  ?s [owl : sameAs->?s2] 

#  ?s [rdf : type->?o] 

#  {IN}{$MC>  ?o{/IN>  )) 
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#SPLIT 

#ELIMINATED 

#(*  <#eq-rep-p>  *) 

#Forall  ?p  ?o  ?s  ?p2  ( 

#  ?s[?p2->?o]  And( 

#  ?p [owl : sameAs->?p2] 

#  ?s[?p->?o] 

#  {N0TIN}{$MPT>  ?p2{/N0TIN}  )) 

#ELIMINATED 

#Forall  ?p  ?o  ?s  ?p2  ( 

#  ?s[?p2->?o]  And( 

#  ?p [owl : sameAs->?p2] 

#  ?s [?p— >?o] 

#  {IN}{$MP>  ?p2{/IN}  )) 

#ELIMINATED 

#Forall  ?p  ?o  ?s  ?p2  ( 

#  ?s [rdf : type->?o]  And( 

#  ?p [owl : sameAs->rdf : type] 

#  ?s[?p->?o] 

#  {N0TIN}{$MC}  ?o{/N0TIN}  )) 

#ELIMINATED 

#Forall  ?p  ?o  ?s  ?p2  ( 

#  ?s [rdf : type->?o]  And( 

#  ?p [owl : sameAs->rdf : type] 

#  ?s [?p— >?o] 

#  {IN}{$MC}  ?o{/IN>  )) 

#SPLIT 

#ELIMINATED 

#(*  <#eq-rep-o>  *) 

#Forall  ?p  ?o  ?s  ?o2  ( 

#  ?s [?p->?o2]  And( 

#  ?o [owl : sameAs->?o2] 

#  ?  s [?p— >?o] 

#  {N0TIN}{$MPT>  ?p{/N0TIN}  )) 

#ELIMINATED 

#Forall  ?p  ?o  ?s  ?o2  ( 

#  ?s [?p->?o2]  And( 

#  ?o [owl : sameAs->?o2] 

#  ?s[?p->?o] 

#  {IN}{$MP>  ?p{/IN>  )) 


#ELIMINATED 
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#Forall  ?p  ?o  ?s  ?o2  ( 

#  ?s [rdf : type->?o2]  And( 

#  ?o [owl : sameAs->?o2] 

#  ?s [rdf : type->?o] 

#  {N0TIN}{$MC}  ?o2{/N0TIN}  )) 

#ELIMINATED 

#Forall  ?p  ?o  ?s  ?o2  ( 

#  ?s [rdf : type->?o2]  And( 

#  ?o [owl : sameAs->?o2] 

#  ?s [rdf : type->?o] 

#  {IN>{$MC>  ?o2{/IN>  )) 

#ELIMINATED 

#(*  <#eq-diffl>  *) 

#Forall  ?x  ?y  ( 

#  r if : error ()  And( 

#  ?x [owl : sameAs->?y] 

#  ?x[owl :differentFrom->?y]  )) 

#UNINTERESTING 

#(*  <#prp-ap-label>  *) 

#  rdf s : label [rdf : type->owl : AnnotationProperty] 

#UNINTERESTING 

#(*  <#prp-ap-comment>  *) 

#  rdf s : comment [rdf : type->owl : AnnotationProperty] 

#UN INTERESTING 

#(*  <#prp-ap-seeAlso>  *) 

#  rdf s : seeAlso [rdf : type->owl : AnnotationProperty] 

#UN INTERESTING 

#(*  <#prp-ap-isDef inedBy>  *) 

#  rdf s : isDef inedBy [rdf : type->owl : AnnotationProperty] 

#UNINTERESTING 

#(*  <#prp-ap-deprecated>  *) 

#  owl : deprecated [rdf : type->owl : AnnotationProperty] 

#UNINTERESTING 

#(*  <#prp-ap-priorVersion>  *) 

#  owl : priorVersion [rdf : type->owl : AnnotationProperty] 

#UN INTERESTING 

#(*  <#prp-ap-backwardCompatibleWith>  *) 

#  owl :backwardCompatibleWith[rdf : type->owl : AnnotationProperty] 
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#UNINTERESTING 

#(*  <#prp-ap-incompatibleWith>  *) 

#  owl : incompatibleWith [rdf : type->owl : AnnotationProperty] 

#SPLIT 

(*  <#prp-dom>  *) 

Forall  ?p  ?c  ?x  ?y  ( 

?x[rdf :type->?c]  And( 

?p [rdf  s : domain->?c] 

?x[?p->?y] 

{N0TIN>{$MC}  ?c{/N0TIN>  )) 

#ELIMINATED 
#Forall  ?p  ?c  ?x  ?y  ( 

#  ?x[rdf :type->?c]  And( 

#  ?p[rdf s :domain->?c] 

#  ?x[?p->?y] 

#  {IN>{$MC>  ?c{/IN}  )) 

#SPLIT 

(*  <#prp-rng>  *) 

Forall  ?p  ?c  ?x  ?y  ( 

?y [rdf :type->?c]  And( 

?p [rdf s :range->?c] 

?x[?p->?y] 

{N0TIN>{$MC}  ?c{/N0TIN}  )) 

#ELIMINATED 
#(*  <#prp-rng>  *) 

#Forall  ?p  ?c  ?x  ?y  ( 

#  ?y [rdf :type->?c]  And( 

#  ?p[rdf s :range->?c] 

#  ?x[?p->?y] 

#  {IN>{$MC>  ?c{/IN}  )) 

#ELIMINATED 
#(*  <#prp-fp>  *) 

#Forall  ?p  ?y2  ?x  ?yl  ( 

#  ?yl  [owl : sameAs->?y2]  And( 

#  ?p [rdf : type->owl : FunctionalProperty] 

#  ?x[?p->?yl] 

#  ?x[?p->?y2]  )) 

#ELIMINATED 
#(*  <#prp-ifp>  *) 

#Forall  ?p  ?xl  ?x2  ?y  ( 
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#  ?xl [owl : sameAs->?x2]  And( 

#  ?p [rdf : type->owl : InverseFunctionalProperty] 

#  ?xl  [?p->?y] 

#  ?x2 [?p->?y]  )) 

(*  <#prp-irp>  *) 

Forall  ?p  ?x  ( 

rif: error ()  And( 

?p [rdf : type->owl : Irref lexiveProperty] 
?x[?p->?x]  )) 


#SPLIT 

(*  <#prp-symp>  *) 

Forall  ?p  ?x  ?y  ( 

?y[?p->?x]  And( 

?p [rdf : type->owl : SymmetricProperty] 
?x[?p->?y] 

{N0TIN>{$MPT>  ?p{/N0TIN>  )) 

Forall  ?p  ?x  ?y  ( 

?y[?p->?x]  And( 

?p [rdf : type->owl : SymmetricProperty] 
{IN>{$MP>  ?p{/IN> 

?x[?p->?y]  )) 

Forall  ?p  ?x  ?y  ( 

?y [rdf :type->?x]  And( 

rdf : type [rdf : type->owl : SymmetricProperty] 
?x  [rdf : type->?y] 

{N0TIN>{$MC}  ?x{/N0TIN}  )) 

#ELIMINATED 
#Forall  ?p  ?x  ?y  ( 

#  ?y [rdf : type->?x]  And( 

#  rdf : type [rdf : type->owl : SymmetricProperty] 

#  ?x [rdf : type->?y] 

#  {IN}{$MC>  ?x{/IN>  )) 

#ELIMINATED 

#(*  <#prp-asyp>  *) 

#Forall  ?p  ?x  ?y  ( 

#  rif: error ()  And( 

#  ?p [rdf : type->owl : AsymmetricProperty] 

#  ?x[?p->?y] 

#  ?y[?p->?x]  )) 


#SPLIT 
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#ELIMINATED 
#(*  <#prp-trp>  *) 

#Forall  ?p  ?x  ?z  ?y  ( 

#  ?x[?p->?z]  And( 

#  ?p [rdf : type->owl : Transit iveProperty] 

#  ?x[?p->?y] 

#  ?y[?p->?z] 

#  {N0TIN}{$MPT>  ?p{/N0TIN>  )) 

Forall  ?p  ?x  ?z  ?y  ( 

?x[?p->?z]  And( 

?p [rdf : type->owl : TransitiveProperty] 
{IN>{$MP>  ?p{/IN> 

?x[?p->?y] 

?y [?p->?z]  )) 

#ELIMINATED 
#Forall  ?p  ?x  ?z  ?y  ( 

#  ?x[rdf :type->?z]  And( 

#  rdf : type [rdf : type->owl : TransitiveProperty] 

#  ?x [rdf : type->?y] 

#  ?y [rdf : type->?z] 

#  {N0TIN}{$MC>  ?z{/N0TIN>  )) 

#ELIMINATED 
#Forall  ?p  ?x  ?z  ?y  ( 

#  ?x [rdf : type->?z]  And( 

#  rdf : type [rdf : type->owl : TransitiveProperty] 

#  ?x [rdf : type->?y] 

#  ?y [rdf : type->?z] 

#  {IN}{$MC>  ?z{/IN>  )) 

#SPLIT 

(*  <#prp-spol>  *) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?x[?p2->?y]  And( 

?pl [rdf s : subProperty0f->?p2] 

?x[?pl->?y] 

{N0TIN>{$MPT>  ?p2{/N0TIN}  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?x[?p2->?y]  And( 

#  ?pl [rdf s : subProperty0f->?p2] 

#  ?x[?pl->?y] 

#  {IN}{$MP>  ?p2{/IN>  )) 
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Forall  ?x  ?y  ?p2  ?pl  ( 

?x[rdf :type->?y]  And( 

?pl [rdf  s : subPropertyOf ->rdf : type] 

?x [?pl->?y] 

{N0TINH$MC}  ?y{/N0TIN>  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?x [rdf : type->?y]  And( 

#  ?pl [rdf s : subPropertyOf->rdf : type] 

#  ?x[?pl->?y] 

#  {INH$MC}  ?y{/IN>  )) 

#SPLIT 

(*  <#prp-eqpl>  *) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?x[?p2->?y]  And( 

?pl [owl : equivalentProperty->?p2] 

?x [?pl->?y] 

{N0TIN}{$MPT}  ?p2{/N0TIN}  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?x[?p2->?y]  And( 

#  ?pl [owl : equivalentProperty->?p2] 

#  ?x[?pl->?y] 

#  {IN}{$MP>  ?p2{/IN}  )) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?x[rdf :type->?y]  And( 

?pl [owl : equivalentProperty->rdf : type] 
?x [?pl->?y] 

{N0TINH$MC}  ?y{/N0TIN>  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?x [rdf : type->?y]  And( 

#  ?pl [owl : equivalentProperty->rdf : type] 

#  ?x[?pl->?y] 

#  {IN}{$MC}  ?y{/IN>  )) 

#SPLIT 

(*  <#prp-eqp2>  *) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?x[?pl->?y]  And( 

?pl [owl : equivalentProperty->?p2] 

?x [?p2->?y] 
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{NOTINH$MPT}  ?pl{/NOTIN}  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?x [?pl->?y]  And( 

#  ?pl [owl : equivalentProperty->?p2] 

#  ?x[?p2->?y] 

#  {IN}{$MP>  ?pl{/IN}  )) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?x[rdf :type->?y]  And( 

rdf : type [owl : equivalentProperty->?p2] 
?x  [?p2->?y] 

{N0TIN>{$MC>  ?y{/N0TIN>  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?x [rdf : type->?y]  And( 

#  rdf : type [owl : equivalentProperty->?p2] 

#  ?x[?p2->?y] 

#  {IN}{$MC}  ?y{/IN>  )) 

#ELIMINATED 
#(*  <#prp-pdw>  *) 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  rif: error ()  And( 

#  ?pl [owl : propertyDis j  ointWith->?p2] 

#  ?x[?pl->?y] 

#  ?x[?p2->?y]  )) 

#SPLIT 

(*  <#prp-lnvl>  *) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?y[?p2->?x]  And( 

?pl [owl : inverse0f->?p2] 

?x [?pl->?y] 

{N0TIN}{$MPT>  ?p2{/N0TIN}  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?y[?p2->?x]  And( 

#  ?pl [owl : inverse0f->?p2] 

#  ?x[?pl->?y] 

#  {IN}{$MP>  ?p2{/IN}  )) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?y [rdf :type->?x]  And( 
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?pl [owl : inverseOf ->rdf : type] 

?x [?pl->?y] 

{NOTINH$MC}  ?x{/N0TIN>  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?y [rdf : type->?x]  And( 

#  ?pl [owl : inverseOf ->rdf : type] 

#  ?x[?pl->?y] 

#  {IN}{$MC}  ?x{/IN>  )) 

#SPLIT 

(*  <#prp-inv2>  *) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?y[?pl->?x]  And( 

?pl [owl : inverse0f->?p2] 

?x [?p2->?y] 

{N0TIN}{$MPT}  ?pl{/N0TIN}  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?y[?pl->?x]  And( 

#  ?pl [owl : inverseOf ->?p2] 

#  ?x[?p2->?y] 

#  {IN}{$MP>  ?pl{/IN}  )) 

Forall  ?x  ?y  ?p2  ?pl  ( 

?y [rdf :type->?x]  And( 

rdf : type [owl : inverseOf ->?p2] 

?x [?p2->?y] 

{N0TIN}{$MC>  ?x{/N0TIN>  )) 

#ELIMINATED 

#Forall  ?x  ?y  ?p2  ?pl  ( 

#  ?y [rdf : type->?x]  And( 

#  rdf : type [owl : inverseOf ->?p2] 

#  ?x[?p2->?y] 

#  {IN}{$MC}  ?x{/IN>  )) 

#UNINTERESTING 
#(*  <#cls-thing>  *) 

#  owl : Thing [rdf :type->owl : Class] 

#UNINTERESTING 

#(*  <#cls-nothingl>  *) 

#  owl : Nothing [rdf : type->owl : Class] 
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(*  <#cls-nothing2>  *) 

Forall  ?x  ( 

rif: error ()  ?x [rdf : type->owl : Nothing] ) 

#SPLIT 

#ELIMINATED 

#(*  <#cls-svfl>  *) 

#Forall  ?p  ?v  ?u  ?x  ?y  ( 

#  ?u[rdf : type->?x]  And( 

#  ?x [owl : someValuesFrom->?y] 

#  ?x [owl : onProperty->?p] 

#  ?u[?p->?v] 

#  ?v [rdf : type->?y] 

#  {N0TIN}{$MC>  ?x{/N0TIN>  )) 

#ELIMINATED 

#Forall  ?p  ?v  ?u  ?x  ?y  ( 

#  ?u [rdf : type->?x]  And( 

#  ?x [owl : someValuesFrom->?y] 

#  ?x [owl : onProperty->?p] 

#  ?u[?p->?v] 

#  ?v [rdf : type->?y] 

#  {IN}{$MC>  ?x{/IN>  )) 

#SPLIT 

(*  <#cls-svf2>  *) 

Forall  ?p  ?v  ?u  ?x  ( 

?u[rdf : type->?x]  And( 

?x [owl : someValuesFrom->owl : Thing] 

?x [owl : onProperty->?p] 

?u[?p->?v] 

{N0TIN>{$MC}  ?x{/N0TIN>  )) 

#ELIMINATED 
#Forall  ?p  ?v  ?u  ?x  ( 

#  ?u [rdf : type->?x]  And( 

#  ?x [owl : someValuesFrom->owl : Thing] 

#  ?x [owl : onProperty->?p] 

#  ?u[?p->?v] 

#  {IN>{$MC>  ?x{/IN>  )) 

#SPLIT 

#ELIMINATED 

#(*  <#cls-avf>  *) 

#Forall  ?p  ?v  ?u  ?x  ?y  ( 

#  ?v[rdf :type->?y]  And( 

#  ?x [owl : allValuesFrom->?y] 
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#  ?x [owl : onProperty->?p] 

#  ?u [rdf : type->?x] 

#  ?u[?p->?v] 

#  {N0TIN}{$MC>  ?y{/N0TIN>  )) 

#ELIMINATED 

#Forall  ?p  ?v  ?u  ?x  ?y  ( 

#  ?v [rdf : type->?y]  And( 

#  ?x [owl : allValuesFrom->?y] 

#  ?x [owl : onProperty->?p] 

#  ?u [rdf : type->?x] 

#  ?u[?p->?v] 

#  {IN}{$MC>  ?y{/IN>  )) 

#SPLIT 

(*  <#cls-hvl>  *) 

Forall  ?p  ?u  ?x  ?y  ( 

?u[?p->?y]  And( 

?x [owl : hasValue->?y] 

?x [owl : onProperty->?p] 

?u [rdf : type->?x] 

{N0TIN>{$MPT>  ?p{/N0TIN>  )) 

#ELIMINATED 
#Forall  ?p  ?u  ?x  ?y  ( 

#  ?u[?p->?y]  And( 

#  ?x [owl :hasValue->?y] 

#  ?x [owl : onProperty->?p] 

#  ?u [rdf : type->?x] 

#  {INH$MP>  ?p{/IN}  )) 

Forall  ?p  ?u  ?x  ?y  ( 

?u[rdf : type->?y]  And( 

?x [owl : hasValue->?y] 

?x [owl : onProperty->rdf : type] 

?u [rdf : type->?x] 

{N0TIN>{$MC}  ?y{/N0TIN}  )) 

#ELIMINATED 
#Forall  ?p  ?u  ?x  ?y  ( 

#  ?u [rdf : type->?y]  And( 

#  ?x [owl :hasValue->?y] 

#  ?x [owl :onProperty->rdf : type] 

#  ?u[rdf : type->?x] 

#  {INH$MC>  ?y{/IN}  )) 


#SPLIT 
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(*  <#cls-hv2>  *) 

Forall  ?p  ?u  ?x  ?y  ( 

?u[rdf : type->?x]  And( 

?x [owl : hasValue->?y] 

?x [owl : onProperty->?p] 

?u[?p->?y] 

{N0TIN>{$MC}  ?x{/N0TIN}  )) 

#ELIMINATED 
#Forall  ?p  ?u  ?x  ?y  ( 

#  ?u [rdf : type->?x]  And( 

#  ?x [owl :hasValue->?y] 

#  ?x [owl : onProperty->?p] 

#  ?u[?p->?y] 

#  {IN}{$MC>  ?x{/IN>  )) 

#ELIMINATED 

#(*  <#cls-maxcl>  *) 

#Forall  ?p  ?u  ?x  ?y  ( 

#  rif: error ()  And( 

#  ?x [owl : maxCardinality->0] 

#  ?x [owl : onProperty->?p] 

#  ?u [rdf : type->?x] 

#  ?u[?p->?y]  )) 

#ELIMINATED 

#(*  <#cls-maxc2>  *) 

#Forall  ?p  ?y2  ?u  ?x  ?yl  ( 

#  ?yl  [owl : sameAs->?y2]  And( 

#  ?x [owl :maxCardinality->l] 

#  ?x [owl : onProperty->?p] 

#  ?u [rdf : type->?x] 

#  ?u[?p->?yl] 

#  ?u[?p->?y2]  )) 

#ELIMINATED 

#(*  <#cls-maxqcl>  *) 

#Forall  ?p  ?c  ?u  ?x  ?y  ( 

#  rif: error ()  And( 

#  ?x [owl :maxQualif iedCardinality->0] 

#  ?x [owl : onProperty->?p] 

#  ?x [owl : onClass->?c] 

#  ?u [rdf : type->?x] 

#  ?u[?p->?y] 

#  ?y [rdf :type->?c]  )) 


#ELIMINATED 
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#(*  <#cls-maxqc2>  *) 

#Forall  ?p  ?u  ?x  ?y  ( 

#  rif: error ()  And( 

#  ?x [owl rmaxQualif iedCardinality->0] 

#  ?x [owl : onProperty->?p] 

#  ?x [owl :onClass->owl: Thing] 

#  ?u [rdf : type->?x] 

#  ?u[?p->?y]  )) 

#ELIMINATED 

#(*  <#cls-maxqc3>  *) 

#Forall  ?p  ?y2  ?c  ?u  ?x  ?yl  ( 

#  ?yl [owl : sameAs->?y2]  And( 

#  ?x [owl rmaxQualif iedCardinality->l] 

#  ?x [owl : onProperty->?p] 

#  ?x [owl : onClass->?c] 

#  ?u [rdf : type->?x] 

#  ?u[?p->?yl] 

#  ?yl [rdf : type->?c] 

#  ?u[?p->?y2] 

#  ?y2 [rdf : type->?c]  )) 

#ELIMINATED 

#(*  <#cls-maxqc4>  *) 

#Forall  ?p  ?y2  ?u  ?x  ?yl  ( 

#  ?yl  [owl : sameAs->?y2]  And( 

#  ?x [owl rmaxQualif iedCardinality->l] 

#  ?x [owl : onProperty->?p] 

#  ?x [owl :onClass->owl: Thing] 

#  ?u [rdf : type->?x] 

#  ?u[?p->?yl] 

#  ?u[?p->?y2]  )) 

#SPLIT 

(*  <#cax-sco>  *) 

Forall  ?x  ?cl  ?c2  ( 

?x[rdf :type->?c2]  And( 

?cl [rdf s  r  subClass0f->?c2] 

?x [rdf : type->?cl] 

{N0TIN>{$MC}  ?c2{/N0TIN>  )) 

#ELIMINATED 
#Forall  ?x  ?cl  ?c2  ( 

#  ?x[rdf :type->?c2]  And( 

#  ?cl [rdf s : subClass0f->?c2] 

# 


# 


?x [rdf : type->?cl] 

{IN}{$MC>  ?c2{/IN>  )) 
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#SPLIT 

(*  <#cax-eqcl>  *) 

Forall  ?x  ?cl  ?c2  ( 

?x[rdf :type->?c2]  And( 

?cl [owl : equivalentClass->?c2] 

?x [rdf : type->?cl] 

{N0TIN>{$MC}  ?c2{/N0TIN>  )) 

#ELIMINATED 
#Forall  ?x  ?cl  ?c2  ( 

#  ?x[rdf :type->?c2]  And( 

#  ?cl [owl : equivalentClass->?c2] 

#  ?x [rdf : type->?cl] 

#  {IN}{$MC>  ?c2{/IN>  )) 

#SPLIT 

(*  <#cax-eqc2>  *) 

Forall  ?x  ?cl  ?c2  ( 

?x[rdf :type->?cl]  And( 

?cl [owl : equivalentClass->?c2] 

?x [rdf : type->?c2] 

{N0TIN>{$MC}  ?cl{/N0TIN>  )) 

#ELIMINATED 
#Forall  ?x  ?cl  ?c2  ( 

#  ?x[rdf :type->?cl]  And( 

#  ?cl [owl : equivalentClass->?c2] 

#  ?x [rdf : type->?c2] 

#  {IN>{$MC>  ?cl{/IN}  )) 

#ELIMINATED 
#(*  <#cax-dw>  *) 

#Forall  ?x  ?cl  ?c2  ( 

#  rif: error ()  And( 

#  ?cl [owl :disjointWith->?c2] 

#  ?x [rdf : type->?cl] 

#  ?x [rdf : type->?c2]  )) 

#ELIMINATED 

#(*  <#prp-npal>  *) 

#Forall  ?x  ?il  ?p  ?i2  ( 

#  rif: error ()  And( 

#  ?x [owl : sourceIndividual->?il] 

#  ?x [owl : assertionProperty->?p] 

#  ?x [owl : targetlndividual->?i2] 

#  ?il [?p->?i2]  )) 
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#ELIMINATED 

#(*  <#prp-npa2>  *) 

#Forall  ?x  ?i  ?p  ?lt  ( 

#  rif: error ()  And( 

#  ?x [owl : sourceIndividual->?i] 

#  ?x [owl : assertionProperty->?p] 

#  ?x [owl : targetValue->?lt] 

#  ?i[?p->?lt]  )) 

#ELIMINATED 
#(*  <#cax-dw>  *) 

#Forall  ?cl  ?c2  ?x  ( 

#  rif: error ()  And( 

#  ?cl [owl :disjointWith->?c2] 

#  ?x [rdf : type->?cl] 

#  ?x [rdf : type->?c2]  )) 

#ELIMINATED 
#(*  <#cls-com>  *) 

#Forall  ?cl  ?c2  ?x  ( 

#  rif: error ()  :-  And( 

#  ?cl  [owl : complement0f->?c2] 

#  ?x [rdf : type->?cl] 

#  ?x [rdf : type->?c2]  )) 


(*  <#eq-dif f 2-3>  *) 

Forall  ?x  ?y  ( 

_checkDif f erent (?x  ?y)  :-  And  ( 

?x [rdf : type->owl : AllDiff erent] 
?x  [owl : distinctMembers->?y]  )) 

Forall  ?x  ?y  ( 

_checkDiff erent (?x  ?y)  :-  And  ( 

?x [rdf : type->owl : AllDiff erent] 
?x [owl : members->?y]  ) ) 

Forall  ?x  ?y  ?z  ( 

_checkDiff erent (?x  ?y)  :-  And  ( 
_checkDiff erent (?x  ?z) 

?z [rdf : rest->?y] 

Not(?y  =  rdf:nil)  )) 

Forall  ?x  ?yl  ?y2  ?zl  ?z2  ( 
rif: error ()  :-  And  ( 

_checkDiff erent (?x  ?yl) 
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_checkDif f erent (?x  ?y2) 

Not (?yl  =  ?y2) 

?yl [rdf :f irst->?zl] 

?y2 [rdf : f irst->?z2] 

?zl [owl : sameAs->?z2]  )) 

#AUXILIARY  ELIMINATION 
#(*  <#prp-adp>  *) 

#  Forall  ?r  ?1  ( 

#  _checkDisjointProperties (?r  ?1)  And  ( 

#  ?r  [rdf : type->owl : AllDis j  ointPropert ies] 

#  ?r [owl : members  ->  ?1]  )) 

#AUXILIARY  ELIMINATION 

#  Forall  ?r  ?1  ?x  ( 

#  _checkDisjointProperties (?r  ?1)  And  ( 

#  _checkDisjointProperties (?r  ?x) 

#  ?x [rdf : rest->?l] 

#  Not (?1  =  rdf: nil)  )) 

#ELIMINATED 

#  Forall  ?x  ?y  ?o  ?v  ?11  ?12  ?r  ( 

#  rif: error ()  And  ( 

#  _checkDisjointProperties(?r  ?11) 

#  _checkDisjointProperties(?r  ?12) 

#  Not (?11  =  ?12) 

#  ?11 [rdf : f irst->?x] 

#  ?12 [rdf : f irst->?y] 

#  ?o[?x->?v] 

#  ?o[?y->?v]  )) 

#AUXILIARY  ELIMINATION 
#(*  <#cax-adc>  *) 

#  Forall  ?r  ?1  ( 

#  _checkDisjointClasses (?r  ?1)  And  ( 

#  ?r [rdf: type  ->  owl : AllDis jointClasses] 

#  ?r [owl : members  ->  ?1]  )) 

#AUXILIARY  ELIMINATION 

#  Forall  ?r  ?1  ?x  ( 

#  _checkDis jointClasses (?r  ?1)  :-  And  ( 

#  _checkDisjointClasses(?r  ?x) 

#  ?x [rdf : rest->?l] 

#  Not (?1  =  rdf: nil)  )) 

#ELIMINATED 

#  Forall  ?x  ?y  ?o  ?11  ?12  ?r  ( 
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#  r if : error ()  And  ( 

#  _checkDisjointClasses(?r  ?11) 

#  _checkDisjointClasses(?r  ?12) 

#  Not (?11  =  ?12) 

#  ?11 [rdf : f irst->?x] 

#  ?12 [rdf : f irst->?y] 

#  ?o [rdf : type->?x] 

#  ?o [rdf : type->?y]  )) 

#AUXILIARY  ELIMINATION 
#(*  <#prp-spo2>  *) 

#  Forall  ?p  ?pc  ( 

#  _markCheckChain(?p  ?pc)  ?p [owl :propertyChainAxiom->?pc]  ) 

#AUXILIARY  ELIMINATION 

#  Forall  ?p  ?pc  ( 

#  _markCheckChain(?p  ?pc)  And  ( 

#  _markCheckChain(?p  ?q) 

#  ?q[rdf :rest->?pc] 

#  Not(?pc  =  rdf: nil)  )) 

#ELIMINATED 

#  Forall  ?q  ?start  ?pc  ?last  ?p  ( 

#  _checkChain(?q  ?start  ?pc  ?last)  And  ( 

#  _markCheckChain(?q  ?pc) 

#  ?pc [rdf : f irst->?p] 

#  ?pc [rdf : rest->rdf :nil] 

#  ?start [?p->?last]  )) 

#ELIMINATED 

#  Forall  ?q  ?start  ?pc  ?last  ?p  ?tl  ( 

#  _checkChain(?q  ?start  ?pc  ?last)  And  ( 

#  ?pc  [rdf : f irst->?p] 

#  ?pc [rdf :rest->?tl] 

#  ?start [?p->?next] 

#  _checkChain(?q  ?next  ?tl  ?last)  )) 

#SPLIT 

#AUXILIARY  ELIMINATION 

#  Forall  ?p  ?last  ?pc  ?start  ( 

#  ?start [?p->?last]  And  ( 

#  ?p [owl : propertyChainAxiom->?pc] 

#  _checkChain(?p  ?start  ?pc  ?last) 

#  {N0TIN}{$MPT}  ?p{/N0TIN}  )) 

#AUXILIARY  ELIMINATION 

#  Forall  ?p  ?last  ?pc  ?start  ( 
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#  ?start [?p->?last]  And  ( 

#  ?p [owl : propertyChainAxiom->?pc] 

#  _checkChaln(?p  ?start  ?pc  ?last) 

#  {IN}{$MP>  ?p{/IN>  )) 

#AUXILIARY  ELIMINATION 

#  Forall  ?p  ?last  ?pc  Tstart  ( 

#  ?start [rdf :type->?last]  And  ( 

#  rdf : type [owl : propertyChainAxiom->?pc] 

#  _checkChain (rdf : type  ?start  ?pc  ?last) 

#  {NOTIN}{$MC}  ?last{/NOTIN}  )) 

##AUXILIARY  ELIMINATION 

#  Forall  ?p  ?last  ?pc  Tstart  ( 

#  ?start [rdf :type->?last]  And  ( 

#  rdf : type [owl : propertyChainAxiom->?pc] 

#  _checkChain(rdf :type  ?start  ?pc  ?last) 

#  {IN}{$MC}  ?last{/IN}  )) 

#ELIMINATED 

#(*  <#cls-intl>  *) 

#  Forall  ?c  ?1  ?y  ?ty  ( 

#  _allTypes (?c  ?1  ?y)  And  ( 

#  _markAHTypes  (?c  ?1) 

#  ?1 [rdf : f irst->?ty] 

#  ?1 [rdf : rest->rdf : nil] 

#  ?y [rdf :type->?ty]  )) 

#ELIMINATED 

#  Forall  ?c  ?1  ?y  ?ty  ?tl  ( 

#  _allTypes(?c  ?1  ?y)  And  ( 

#  ?1 [rdf : f irst->?ty] 

#  ?1 [rdf : rest->?tl] 

#  ?y [rdf :type->?ty] 

#  _allTypes (?c  ?tl  ?y)  )) 

#SPLIT 

#AUXILIARY  ELIMINATION 

#  Forall  ?y  ?c  ?1  ( 

#  ?y [rdf : type->?c]  And  ( 

#  ?c [owl : intersectionOf->?l] 

#  .allTypes (?c  ?1  ?y) 

#  {N0TIN>{$MC>  ?c{/N0TIN>  )) 

#AUXILIARY  ELIMINATION 

#  Forall  ?y  ?c  ?1  ( 

#  ?y [rdf : type->?c]  And  ( 
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#  ?c [owl : intersectionOf->?l] 

#  _allTypes(?c  ?1  ?y) 

#  {INH$MC}  ?c{/IN>  )) 

#AUXILIARY  ELIMINATION 
#(*  <#prp-key>  *) 

#  Forall  ?c  ?u  ( 

#  _markSameKey (?c  ?u)  ?c [owl :hasKey->?u]  ) 

#AUXILIARY  ELIMINATION 

#  Forall  ?c  ?u  ?v  ( 

#  _markSameKey (?c  ?u)  And  ( 

#  _markSameKey (?c  ?v) 

#  ?v [rdf : rest->?u] 

#  Not(?u  =  rdf: nil)  )) 

#ELIMINATED 

#  Forall  ?c  ?u  ?x  ?y  ( 

#  _sameKey(?c  ?u  ?x  ?y)  And  ( 

#  _markSameKey (?c  ?u) 

#  ?u  [rdf : f irst->?key] 

#  ?u [rdf : rest->rdf : nil] 

#  ?x[?key->?v]  ?y[?key->?v]  )) 

#ELIMINATED 

#  Forall  ?c  ?u  ?x  ?y  ( 

#  _sameKey(?c  ?u  ?x  ?y)  And  ( 

#  ?u  [rdf : f irst->?key] 

#  ?u [rdf : rest->?tl] 

#  ?x[?key->?v]  ?y[?key->?v] 

#  _sameKey(?c  ?tl  ?x  ?y)  )) 

#ELIMINATED 

#  Forall  ?x  ?y  ?c  ?u  ( 

#  ?x [owl : sameAs->?y]  And  ( 

#  ?c [owl :hasKey->?u]  ?x [rdf : type->?c]  ?y [rdf : type->?c] 

#  _sameKey(?c  ?u  ?x  ?y)  )) 

#SPLIT 

(*  <#cls-uni>  *) 

Forall  ?y  ?c  ?1  ?ci  ( 

?y [rdf :type->?c]  And  ( 

_checkUnionOf (?c  ?1) 

?1  [rdf : f irst->?ci] 

?y  [rdf : type->?ci] 

{N0TIN>{$MC>  ?c{/N0TIN>  )) 
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#ELIMINATED 

#  Forall  ?y  ?c  ?1  ?ci  ( 

#  ?y [rdf :type->?c]  And  ( 

#  _checkUnionOf (?c  ?1) 

#  ?1  [rdf : f irst->?ci] 

#  ?y  [rdf : type->?ci] 

#  {IN}{$MC>  ?c{/IN>  )) 

(*  <#cls-oo>  *) 

Forall  ?c  ?1  ( 

_checkOneOf (?c  ?1)  ?c  [owl : oneOf->?l]  ) 

Forall  ?c  ?1  ?r  ( 

_checkOneOf (?c  ?1)  And  ( 

_checkOneOf (?c  ?r) 

?r  [rdf : rest->?l] 

Not(?l  =  rdf:nil)  )) 


#SPLIT 

Forall  ?yi  ?c  ?1  ( 

?yi [rdf : type->?c]  And  ( 

_checkOneOf (?c  ?1) 

?1 [rdf : f irst->?yi] 
{N0TIN>{$MC>  ?c{/N0TIN}  )) 

Forall  ?yi  ?c  ?1  ( 

?yi [rdf : type->?c]  And  ( 

_checkOneOf (?c  ?1) 

?1 [rdf : f irst->?yi] 
{IN>{$MC>  ?c{/IN>  )) 


#SPLIT 

(*  <#cls-int2>  *) 

Forall  ?y  ?c  ?ci  ?1  ( 

?y [rdf :type->?ci]  And  ( 
_markAHTypes  (?c  ?1) 

?1  [rdf : f irst->?ci] 

?y [rdf : type->?c] 
{N0TIN>{$MC}  ?ci{/N0TIN>  )) 

#ELIMINATED 

#  Forall  ?y  ?c  ?ci  ?1  ( 

#  ?y [rdf :type->?ci]  And  ( 

#  _markAHTypes  (?c  ?1) 

#  ?1  [rdf : f irst->?ci] 

#  ?y [rdf : type->?c] 

{INH$MC}  ?ci{/IN}  )) 


# 
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#EOF 


